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ver  with  performance  times  and  deviations  from  a standard  flight  path  as  indi- 
cators of  skill.  The  algorithm's  initial  procedures  and  these  indicators  were 
used  in  three  empirical  investigations.  The  first  investigation  showed  that 
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specific  Indicators  of  performance  skill  in  pilot  trainingare 

lacking.  This  dissertation  represents  an  effort  to  begin  filling  that 
void.''  An  algorithmic,  performance  state  evaluation  model  was  developed 
for  an  instrument  flight  maneuver  with  performance  times  and  deviations 
from  a standard  flight  path  as  indicators  of  skill.  Jhe  algorithm's 
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initial  procedures  and  these  indicators  were  used  in^three  empirical 
investigations.  The  first  investigation"  showed  that  performance  times 
can  be  used  to  enable  an  observer  to  discriminate  between  performances 

or  performance  states  in  performances  by  two  experienced  pilots.  In 
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the  second  investigation,  means  of  total  performance  time  were  found 
to  discriminate  between  differences  in  treatments  used  in  a training 
experiment  with  student  pilots  as  subjects- (data  from  Brecke,  1975); 
and  a priori  predictions  of  differences  in  effects  of  these  treatments 
on  variability  of  group  performances  at  a specified  location  were 
significant. 

>ln  the  final  investigation,  support  was  found  for  the  hypothesis 
that  a small  set  of  specific  indicators  could  be  used  to  replace  a sum- 
mary indicator  of  variability  in  performances.  Results  of  stepwise 
regression  analyses  indicated  that  7 of  12  specific  indicators  could 
be  used  to  account  for  3^%  to  82%  of  the  variance  in  the  summary  indi- 
ca tor  over  6 performance  trials'and  that  there  were  nearly  identical 
curvilinear  trends  in  means  (r_  = .98)  due  to  improvement  over  trials 
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for  the  summary  indicator  and  deviations  from  a standard.  It  was  con- 
cluded that  with  the  present  maneuver,  the  model  allowed  for  superior 
evaluations  with  fewer  data  points.  The  need  to  test  detailed  analytic 
procedures  in  the  model  and  to  extend  the  methods  used  in  the  develop- 
ment of  the  model  to  other  pilot  training  maneuvers  was  discussed. 
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CHAPTER  I 


THE  PROBLEM  OF  PERFORMANCE  EVALUATION  IN  PILOT  TRAINING 


This  chapter  is  an  introduction  to  the  problem  of  measurement 
and  evaluation  of  skill  in  pilot  training.  In  pilot  training,  evalu- 
ations are  used  to  diagnose  student  pilot  learning  difficulties,  to 
manage  the  training  program,  and  to  conduct  training  research  and 
development.  Because  pilots  operate  a complex  system  in  an  unstable, 
frequently  dangerous  environment,  problems  result  from  differences 
among  evaluation  methods  needed  and  used  in  the  operational  , the 
management,  and  the  research  settings.  To  be  fully  effective,  any 
formal  measurement  end  evaluation  methods  must  ultimately  De  usaoie 
in  each  of  these  settings.  This  study  was  designed  to  investigate 
problems  of  measuring  and  evaluating  skill  in  pilot  performances  from 
the  view  of  training  research  and  development. 

The  thesis  of  this  study  was  that  specific  indicators  of  skill 
in  pilot  performances  could  be  used  with  a performance  state  evalua- 
tion model  to  resolve  a measurement  dilemma:  excessive  detail  versus 

uninformative  generality.  To  establish  a general  framework  for  this 
thesis,  the  problems  of  evaluation  which  confront  the  instructor  pilot 
are  described  first.  This  description  is  followed  by  a delineation 
of  problems  with  the  evaluation  methods  generally  used  in  pilot  train- 
ing management.  Next,  the  needs  for  evaluation  methods  in  pilot 
training  research  and  development  are  considered.  Finally,  the  major 
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points  are  summarized  and  the  specific  purposes  of  this  study  are 
stated . 

Instructor  Pilot  Evaluation  Problems 

Pilots  operate  a complex  system  in  an  unstable,  frequently  dan- 
gerous environment.  Consider  how  these  operational  factors  make  demands 
on  an  instructor  pilot  (IP)  as  he  evaluates  a student  pilot's  perform- 
ance in  an  aircraft.  The  IP  must  monitor  the  student  pilot's  (SP) 
behavior,  the  aircraft,  and  the  surrounding  airspace.  He  must  identify 
inappropriate  SP  behaviors,  unsafe  performance  conditions,  and  dangers 
in  the  airspace.  When  he  detects  an  error  or  threatening  condition, 
he  must  decide  what  action  will  best  meet  the  objectives  of  mission 
safety  and  the  training  needs  of  the  SP.  He  must  carry  out  the 
selected  action.  Observations  in  ail  these  areas  of  performance  must 
be  remembered  or  recorded  and  then  used  to  arrive  at  a meaningful 
score  for  the  performance. 

Clearly,  the  IP  must  process  large  quantities  of  information 
to  fulfill  the  requirements  of  his  assignment.  As  training  tasks  be- 
come more  complex  or  dangerous,  the  demands  on  the  IP  tend  to  increase 
and  IPs  will  tend  to  be  less  and  less  able  to  adequately  process  all 
the  essential  information.  If  the  information  processing  load  on  the 
IP  becomes  excessive,  the  integrity  of  safety  procedures,  training 
effectiveness,  and  evaluation  methods  will  be  compromised.  These 
requirements  and  conditions  have  resulted  in  general  use  of  rating 
methods  to  evaluate  skill  in  performances  during  pilot  training. 


P 


Evaluation  Methods  in  Pilot  Training 


After  more  than  30  years,  rating  scales  are  still  the  basis  of 
evaluation  methods  in  pilot  training.  Rating  scales  are  used  because, 
as  yet,  there  are  no  cost  effective  evaluation  methods  to  meet  the 
needs  of  the  operational  user,  the  IP  (Koonce,  197*0.  Some  alterna- 
tives to  rating  scale  methods  have  been  investigated.  These  alterna- 
tives are  paper  and  pencil  observation  schedules  and,  more  recently, 
automated  data  collection  and  computer  aided  evaluation  systems. 

Rating  scales  are  still  used  because  they  meet  the  needs  of  training 
management  without  intruding  seriously  into  the  IPs'  operational 
capabi 1 i ties . 

As  currently  used,  rating  scale  methods  are  not  adequate  to  meet 
the  needs  of  training  research  and  development.  Rating  scale  methods 
are  inadequate  for  these  needs  because  observations  using  them  lack 
sufficient  discrimination,  i.e.,  the  observations  tend  to  accumulate 
on  one  or  two  points  in  the  rating  scale.  Without  better  discrimina- 
tion, rating  methods  are  not  adequate  as  criteria  to  validate  alterna- 
tive methods  of  measurement  and  evaluation  (Knoop  & Welde,  1973).  To 
make  IP  observations  more  discriminating  in  training  research  and 
development,  extensive  observer  training  and  quality  control  programs 
must  be  developed  and  carried  out  (Horner,  Radinsky,  6 Fitzpatrick, 
1970;  Koonce,  197*0. 

In  their  present  form,  IP  ratings  cannot  be  used  in  training 
research  and  development.  Consider  the  problems  of  standardization 
of  IP  ratings  under  the  present  system.  Inflight  pilot  training  and 
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proficiency  evaluation  is  largely  on  a one-to-one  basis;  this  means 
that  pilot  instructors  train  IPs  and  check  pilots  who,  in  turn,  train 
and  evaluate  SPs.  Since  training  personnel  cannot  simultaneously 
observe  and  evaluate  an  SP's  performance,  they  have  little  common 
basis  for  developing  uniform  criteria.  Rather,  their  training  and 
evaluation  skills  evolve  primarily  from  their  personal  experiences 
in  the  highly  complex  aircraft  environment.  Measures  and  criteria 
developed  in  training  research  might  be  used  to  improve  the  effective- 
ness of  IP  ratings. 

Christensen  and  Mills  ( 1 967)  contended  that  methods  for  the 
evaluation  of  complex  performances  suffer  from  a criterion  problem. 
Quoting  Thorndike  (1947,  p.  29),  these  authors  noted  that  the  crite- 
rion problem  is  one  "of  obtaining  satisfactory  criterion  measures 
against  which  to  validate  tests  and  evaluate  variations  of  training 
methods"  (Christensen  £ Mills,  1967,  p.  335).  Observations  from 
recent  pilot  training  evaluation  studies  confirm  this  observation 
(Horner,  et_  aj_. , 1970;  Knoop  £ Welde,  1973).  Christensen  and  Mills 
also  argued  that  quantification  in  the  absence  of  a criterion  is 
wasted  effort:  "to  generate  numbers  does  not  automatically  assure 

either  understanding  or  validity"  (p.  335).  As  a conclusion  these 
authors  observed  that  "we  seem  to  be  no  nearer  than  we  were  20  years 
ago  to  the  development  of  independent,  uncontaminated  criteria  of 
human  performance  under  operational  conditions"  (p.  339) • 

Seven  years  later,  Koonce  (1974)  reviewed  problems  of  measuring 
and  evaluating  pilot  performances.  A criterion  problem  still  existed. 
He  found  that  measures  derived  from  a sound  theoretical  position  did 


not  exist.  Rather,  he  found  that  trial  and  error  methods  were  used 
with  data  from  every  possible  measurable  source,  or  alternatively, 
that  methods  of  systematic  inspection  and  expert  judgment  were  used 
to  select  measures  for  particular  effects.  Training  research  and 
development  efforts  cannot  be  effective  without  valid  response  mea- 
sures and  consistent  evaluative  criteria.  Perhaps  the  problems  of 
measurement  and  evaluation  validity  ought  to  be  viewed  in  a different 
way. 


Evaluation  Methods  in  Pilot  Training  Research 

Measures  and  criteria  used  in  pilot  training  research  and  develop- 
ment  might  be  used  as  a basis  to  improve  IP  evaluation  methods.  In 
training  research  and  development,  training  objectives,  instruction, 
and  evaluation  are  three  interrelated  components.  Generally,  piloting 
skills  and  competencies,  to  be  developed  in  training,  are  identified 
in  the  objectives.  Instruction  covering  these  competencies  is  prepared 
and  SPs  are  asked  to  use  this  instruction  to  perform  assigned  tasks 
under  specified  test  conditions. 

Iterative  procedures  are  used  in  training  research  and  develop- 
ment: SP  performances  are  observed  and  "if  the  student's  responses  do 

not  correspond  with  the  specified  outcomes,  the  materials  are  revised 
and  the  process  is  repeated"  (Merrill,  1971,  p.  2).  In  this  iterative 
process,  it  is  generally  assumed  that  the  necessary  response  measures 
and  criteria  exist  and  that  relationships  between  measures  and  cri- 
teria are  well  known.  Evidently,  such  assumptions  must  be  questioned 
in  the  area  of  pilot  training  (Brecke  6 Gerlach,  1972).  It  follows 


that  effective  pilot  training  research  efforts  must  include  an  inves- 
tigation of  measures  of  skill  in  the  area  of  interest  (Koonce,  1 97^* ; 
Shipley,  Gerlach,  6 Brecke,  1974).  The  results  of  these  and  related 
Investigations  should  be  studied  to  improve  our  understanding  of  mea- 
sures of  skill  and  methods  of  evaluation  in  pilot  training. 

Summary 

Piloting  performances  are  complex  because  they  are  composed  of 
many  events.  Some  of  these  events  will  be  more  crucial  to  the  attain- 
ment of  mission  objectives  than  others.  Critical  events  probably  occur 
at  different  times  or  places  during  a performance  but  at  similar  times 
and  places  for  repeated  performances  of  the  same  task.  For  the  purposes 
of  measurement  and  evaluation  of  skill  in  pilot  training,  it  would  be 
heipfui  to  know  the  characteristics  of  these  critical  events.  For  a 
given  task,  it  should  be  possible  to  determine  empirically  such  factors 
as  the  relative  importance  and  the  time  or  placement  of  such  events  in 
the  operational  sequence.  Given  such  empirical  data,  an  evaluator  would 
have  a basis  to  develop  objective  performance  measurement  and  evaluation 
standards. 

A training  research  approach  to  measurement  and  evaluation  prob- 
lems in  pilot  training  should  include  these  objectives: 

1.  To  identify  potentially  critical  points  or  events  in 
descriptions  of  pilot  behaviors  that  make  up  the  operational  sequence 
of  the  performance  tasks. 

2.  To  develop  observation  schedules  and  scoring  procedures  to 
account  for  the  effects  of  these  events  on  performance  skill. 
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3.  To  determine  empirically  relative  frequencies  of  these 
critical  events  throughout  performances  of  the  assigned  task  from 
objective  data. 

4.  To  train  IPs,  check  pilots,  and  other  pilot  training  per- 
sonnel to  employ  the  schedules  and  procedures  with  SP  performances, 
first  in  a simulator,  then  in  the  aircraft. 

5.  To  develop  reliability  assessment  procedures  for  use  with 
measurement  and  evaluation  practices  in  the  aircraft  based  on  the  out- 
comes of  the  four  preceding  objectives. 

In  the  present  study,  the  first  three  objectives  were  investi- 
gated from  the  view  of  a pilot  training  research  and  development  pro- 
ject. A secondary  purpose  was  to  investigate  measures  and  methods  that 


might  be  usable  in  the  operational  environment.  The  primary  purpose 
was  to  identify  or  develop  indicators  of  performance  skill  that  would 
resolve  a recurring  dilemma  in  pilot  training  measurement  and  evalua- 
tion: excessive  detail  versus  uninformative  generality  (Youtz  6 Erick- 

sen,  1947).  The  hypothesis  was  that  a set  of  specific  indicators  could 
be  used  to  replace  a set  of  more  general,  summary  indicators  in  train- 
ing research  and  development.  In  the  context  of  the  present  study, 
specific  indicators  were  measures  of  performance  events  at  particular 
points  in  a performance.  Alternatively,  summary  indicators  were  mea- 
sures obtained  as  a function  of  sums  computed  from  each  observation 
in  an  entire  set  of  time  series  data  from  the  same  performance. 


CHAPTER  I I 


THEORETICAL  ANALYSIS  AND  REVIEW  OF  LITERATURE 


In  this  chapter,  a theoretical  analysis  is  used  to  define  a 
dependent  variable,  precision  of  control,  and  to  relate  it  to  character- 
istics of  skill  in  piloting  performances.  This  analysis  is  followed  by 
a review  of  the  pilot  training  literature  on  selected  measurement  and 
evaluation  methods.  The  objective  of  the  review  was  to  evaluate  mea- 
sures and  methods  that  might  be  used  as  indicators  cf  precision  of 
control.  The  first  two  sections  of  the  chapter  contain  the  results 
of  the  theoretical  analysis.  In  the  third  section,  response  measures 
and  methods  are  reviewed,  in  the  fourth  section,  t simple  performance 
state  evaluation  model  is  described.  The  last  section  is  a summary 
of  the  chapter. 

In  this  chapter,  the  concept  of  a "state"  from  control  theory 
is  used  to  refer  to  segments  of  a flight  path.  It  is  assumed  that 
training  objectives  designate  the  characteristics  of  the  flight  path 
to  be  used  in  student  performances.  Referring  to  Figure  1,  these 
training  objectives  prescribe  the  system  inputs  to  the  pilot  and 
define  the  characteristics  of  the  flight  path  expected  as  system 
performance  outputs.  As  used  in  the  present  study,  the  evaluation 
model  is  essentially  a process  of  comparison  between  specified  and 
measured  performances. 
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As  used  here,  "state"  is  a measurement  or  output  determined  con- 
cept. This  simple  application  of  "state"  differs  somewhat  from  the 
sense  of  more  formal  definitions  in  which  "state"  is  used  to  represent 
the  internal  workings  of  some  dynamic  system,  i.e.,  the  pilot  (Figure 
1).  In  previous  research,  Connelly,  Schuler,  and  Knoop  (1969)  began 
with  a similar  definition  of  a state.  However,  their  purpose  was  to 
mathematically  simulate  the  behavior  of  a human  evaluator.  To  do  this 
modeling  operationally,  they  used  adaptive  mathematical  methods  on  a 
computer  without  any  prior  assumptions  about  relevant  performance  mea- 
sures or  criteria,  i.e.,  inputs  or  system  performance  variables. 

Describing  Complex  Performances 

Uncertainty  is  a part  of  every  pilot's,  every  instructor's,  and 
every  evaluator's  mission  in  pilot  training.  An  aircraft's  flight  path 
is  not  visible  and  a pilot's  control  behaviors  leave  no  permanent  trace. 
Air  is  unstable  and  weather  factors  cause  unpredictable  variations  in 
the  performance.  Changes  in  a flight  path  wili  also  occur  because  the 
airspace  must  be  shared  with  other  aircraft  and  because  obstacles  must 
be  avoided  over  mountainous  terrain  or  at  low  altitudes.  Pilots, 
Instructors,  and  evaluators  also  possess  the  attribute,  "unpredictabil- 
ity." To  describe  a flight  path  under  any  of  these  conditions,  an 
evaluator  has  been  faced  with  a choice  between  unmanageable  detail  on 
the  one  hand  or  uninformative  generalities  on  the  other  (Youtz  S Erick- 
sen,  1947)  . 

In  the  pilot-aircraft  system  one  of  the  pilot's  functions  is  to 
control  the  aircraft  (Figure  1).  The  results  of  this  control  are 
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reflected  through  feedback  as  values  on  the  aircraft  instruments  and 
as  perceptual  cues.  An  example  of  a visual  cue  is  the  orientation  and 
location  of  the  horizon  relative  to  the  aircraft  altitude  and  attitude. 
A pilot  uses  these  values  and  cues  as  feedback  to  make  decisions  about 
any  changes  in  the  controls  that  are  needed  to  accomplish  his  mission. 

9 

Mission  objectives,  control  changes,  instrument  values,  and  cues  can 
be  systematically  related  to  each  other  through  the  principles  of  con- 
trol theory.  These  principles  are  a useful  tool  in  specifying  values 
and  cues  in  a standard  training  maneuver,  in  determining  a list  of 
control  movements  essential  to  complete  a prescribed  flight  path,  and 
in  the  development  of  evaluation  methods  (Brecke  S Gerlach,  1972; 
Gerlach,  Brecke,  Reiser,  5 Shipley,  1972;  Shipley,  Gerlach,  & Brecke, 
1974). 

To  approximate  a standard  flight  path  as  he  executes  a maneuver, 
a pilot  uses  information  from  instruments,  references  outside  the  air- 
craft, and  the  mission  objectives.  The  precision  of  control  exhibited 
by  the  pilot  in  maintaining  a flight  path  will  vary  with  such  factors 
as  his  alertness,  his  level  of  piloting  skill,  the  varying  requirements 
of  the  mission,  and  differences  in  flight  activities.  For  example,  in 
the  transition  from  climb  to  descent  in  the  Vertical  S-A  (Figure  2), 
variability  in  maximum  altitude  was  found  to  discriminate  between 
groups  of  student  pilots  given  different  sets  of  preflight  instructions 
(Brecke,  Gerlach,  & Shipley,  1974).  Precision  of  control  is  an  impor- 
tant dependent  variable  to  consider  in  evaluating  pilot  performances. 

The  correctness  of  a pilot's  control  of  the  aircraft  is  evalu- 
ated by  how  well  it  approximates  a "standard."  The  standard  should  be 
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specified  in  the  performance  objectives  for  a mission.  Precision,  a 
concept  which  expresses  the  degree  of  correspondence  between  an  obser- 
vation and  a standard,  is  estimated  by  the  variability  between  experi- 
mental observations  or  measurements  and  a standard  flight  path.  Preci- 
sion is  estimated  from  experimental  data  with  such  statistics  as  the 

variance  and  the  covariance.  Probability  functions  applied  to  these 
statistical  measures  enable  c.ie  to  interpret  degree  of  precision  in 
f l ight  path  data . 

If  precision  is  to  be  a useful  concept  in  evaluating  student 
pilot  performances,  the  evaluator  must  satisfy  a set  of  basic  require- 
ments. These  requirements  are: 

1.  The  standard  flight  path  must  be  accurately  defined. 

2.  The  experimental  or  test  maneuver  must  be  accurately 

observed  or  measured. 

3.  Rules  for  estimating  precision  must  be  consistently  applied 
to  the  data. 

4.  The  appropriate  probability  function  must  be  applied  to  the 
estimates  for  an  accurate  interpretation  of  precision. 

Methods  for  satisfying  these  four  requirements  must  be  identified  or 
developed  and  the  validity  of  any  methods,  however  obtained,  must  be 
empirically  demonstrated. 

Standard  Flight  Path 

In  the  general  case,  specification  of  the  standard  flight  path 


for  test  purposes  is  arbitrary  (Etkin,  1972).  Horner,  Radinsky,  and 
Fitzpatrick  (1970)  reviewed  descriptions  of  the  standard  training 
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maneuvers  used  in  the  Air  Force  Undergraduate  Pilot  Training  (UPT) 
syllabus.  They  concluded  that  ideal  or  standard  flight  path  values 
were  adequately  specified.  That  is,  they  were  able  to  define  rating 
scales  for  instructor  pilots  to  evaluate  video-taped  performances. 

Knoop  and  Welde  (1973)  arrived  at  a different  conclusion.  In 

f 

an  investigation  of  automated  data  collection  and  evaluation  (ADCS) 
methods,  they  found  that  several  descriptions  of  UPT  maneuvers  were 
inadequate  for  developing  computerized  evaluation  methods.  In  some 
cases  the  descriptions  contained  insufficient  information  for  develop- 
ing the  necessary  equations.  In  other  cases,  performances  of  experi- 
enced instructor  pilots  were  found  to  differ  in  form  from  the  standard 
flight  path.  Finally,  they  found  that  instructor  pilots  were  unable 
to  perform  some  of  the  maneuvers  as  specified  because  the  standard 
flight  paths  were  beyond  the  performance  limits  of  the  pilot-aircraft 
system.  Knoop  and  Welde  (1973)  concluded  that  empirical  methods  must 
be  used  to  support  a control  theory  analysis  of  descriptions  given  in 
manuals  and  textbooks  or  by  instructor  pilots. 

Application  of  Control  Theory 

Performance  States 

Pilot  training  researchers  or  performance  evaluators  can  use 
control  theory  as  an  aid  in  understanding  the  concept  of  a standard 
flight  path.  One  can  describe  movements  of  an  aircraft  through  space 
and  time  (i.e.,  the  flight  path)  with  values  from  several  variables. 
Examples  of  these  variables  include  among  others,  altitude,  heading, 
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airspeed,  pitch,  and  power.  Each  set  of  values  on  these  variables  is 
called  a state  of  the  system. 

A state  is  the  complete  set  of  variables  needed  to  describe  the 
entire  performance  of  the  pilot  and  the  aircraft  at  any  instant  In 
time  (Etkin,  1972).  A state,  as  defined  here,  will  contain  more  vari- 
ables  than  needed  by  a pilot,  an  evaluator,  or  a training  researcher. 

One  objective  of  a control  theory  analysis  is  to  select  those  items  in 
a set  of  state  variables  that  are  necessary  and  sufficient  to  describe 
a standard  flight  path  from  start  to  finish.  Brecke  and  Gerlach  (1972) 
developed  a model  which  uses  descriptions  of  maneuvers  given  in  the  UPT 
syllabus  as  a basis  for  selecting  the  minimum  set  of  variables  and 
their  values. 

In  time,  a flight  path  is  continuous  and  actually  consists  of 
an  unlimited  number  of  instantaneous  states.  Practically,  however, 
the  number  of  states  must  be  limited  by  the  number  of  discrete  obser- 
vations that  can  be  made  per  specified  unit  of  time.  Pragmatically, 
observations  taken  at  extremely  high  rates  on  a large  number  of  vari- 
ables result  in  quantities  of  data  that  are  difficult  to  manage  and 
process  with  a computer.  The  concept  of  a state  can  be  extended  using 
the  concept  of  precision  to  obtain  a solution  to  this  problem  of 
excessive  detai 1 . 

Steady  States  and  Transition  States 

Many  observations  in  a series  on  a system  variable  will  be 
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approximately  the  same  value.  A series  of  observations  with  approxi- 
mately the  same  values  can  be  regarded  as  a set  of  samples  from  a single 
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state.  This  extension  of  the  concept  of  "state"  is  equivalent  to  ex- 
tending the  time  frame  from  an  instant  to  seconds,  minutes,  or  longer 
periods.  An  extended  state  composed  of  contiguous  samples  with  approxi- 
mately equal  values  is  called  a steady  state.  It  follows  that  length  of 
time  Is  a variable  to  include  in  the  definition  of  extended  states. 

Few  maneuvers  in  the  UPT  syllabus  are  purely  steady  states.  All 
missions  and  most  training  maneuvers  will  consist  of  a sequence  of 
steady  states.  The  pilot  must  execute  a sequence  of  changes  in  the 
aircraft  controls  to  go  from  one  steady  state  to  the  next.  The  series 
of  instantaneous  states  in  each  period  of  change  in  control  settings 
are  grouped  in  an  extended  set  called  the  transition  state.  Transition 
states  are  much  more  difficult  to  define  and  to  work  with  in  evaluating 
pilot  performances  (Connelly,  Schuler,  £ Knoop,  1969;  Dickman,  197*0 

Transition  states  are  of  special  concern  in  evaluating  student 
pilot  performances  for  two  reasons.  First,  new  behaviors  to  be  learned 
are  usually  located  in  the  transition  states  (Gerlach,  et_  aj_. , 1972). 

For  example,  in  the  transition  from  climb  to  descent  in  the  Vertical 
S-A,  the  student  pilot  must  learn  to  coordinate  smooth,  gradual  changes 
in  pitch  and  power  controls  (see  Figure  2).  Second,  the  frequency  of 
control  changes  specified  in  the  mission  objectives  will  be  greatest  at 
the  transitions.  Therefore,  the  probability  of  errors  in  changing  the 
controls  will  be  at  a maximum  in  the  vicinity  of  transition  states. 

'Some  researchers,  e.g.,  Leshowitz  and  Neilsen  (197**,  1975), 
Investigating  the  problems  of  evaluation  in  UPT  have  chosen  not  to 
work  with  the  transition  states  for  these  reasons. 
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Control  Errors  and  Precision  of  Control 

How  are  control  errors  and  precision  of  control  related?  If  the 
preceding  analysis  is  correct,  the  largest  variabilities  in  flight  path 
data  will  be  found  in  the  vicinity  of  the  transitions.  For  such  condi- 
tions, the  variance,  a statistical  measure  of  variability,  is  time 
dependent  (Jacobs,  1969).  Figure  3 depicts  a set  of  time  dependent 
observations  on  altitude  and  Figure  4 illustrates  a time  independent 
set.  In  Figure  3 each  of  the  altitude  curves  exhibits  a similar  pat- 
tern of  variation  from  the  standard  in  time.  The  curves  in  Figure  4 
reveal  no  discernible  common  pattern. 

To  the  extent  that  similar  control  errors  occur  at  a given 
transition  and  lead  to  similar  deviations  from  the  standard  flight 
path,  knowledge  of  any  dependencies  between  time  and  variability  would 
be  useful.  Methods  need  to  be  developed  to  infer  the  most  probable 
control  errors  from  objective  data  and  to  guide  instructor  pilot  ob- 
servations. Information  about  time  dependencies  could  also  be  used 
diagnostically  by  instructional  designers  and  instructor  pilots  to 
improve  training.  Researchers  working  to  develop  automated  evalua- 
tion systems  need  this  information  to  accurately  specify  criteria  for 
processing  data  in  the  vicinity  of  the  transitions  (Dickman,  1974). 

Indicators  Used  in  Pilot  Training 
Little  is  known  about  the  specific  control  behaviors  of  pilots 


in  the  aircraft  (Reid  5 Etkin,  1972).  Existing  models  of  pilot  behav- 
ior are  based  on  rigidly  prescribed  tracking  tasks  in  the  laboratory. 
Some  practical  knowledge  about  these  behaviors  is  reflected 


Figure  k. — Time  Independent  Deviations  From  Vertical  S-A  Altitude  Curve 
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in  the  accumulated  experiences  of  instructor  pilots  and  is  communicated 
as  flight  line  lore  (Reiser,  Brecke,  £ Gerlach,  1972).  However, 
instructor  pilots  are  likely  to  disagree  about  proper  control  techniques 
and  about  how  to  interpret  deviations  from  a standard  flight  path 
(Horner,  Radinsky,  £ Fitzpatrick,  1970;  Knoop  £ Welde,  1973;  Reiser, 
et  aj_. , 1972).  In  the  development  of  automated  evaluation  systems, 
questions  about  the  meaning  of  deviations  from  the  standard  flight  path 
must  be  solved  (Connelly,  et_  aj_. , 1969;  Dickman,  197**;  Hill  £ Goebel, 
1971  ; Knoop  £ Welde,  1973;  Shipley,  Gerlach,  £ Brecke,  197**). 

Subjective  Indicators 

Rating  scales  are  the  predominant  method  of  measuring  pilot  per- 
formance in  the  aircraft.  A four-point  scale — excellent,  good,  fair, 
and  unsatisfactory — is  used  in  undergraduate  pilot  training  (UPT)  . The 
instructor  pilot  rates  each  maneuver  during  a student's  training  flight 
and  he  also  gives  an  overall  rating  for  the  entire  flight.  Miller 
(19*»7)  reviewed  rating  scales  and  paper  and  pencil  observation  sche- 
dules that  were  researched  in  the  World  War  II  pilot  training  studies. 

He  concluded  that  improvements  in  evaluation  methods  would  not  be  accom- 
plished until  automated  data  collection  systems  (ADCS)  were  available. 

Instructor  pilot  ratings  are  not  likely  to  be  helpful  in  solving 
the  problem  of  deviations  between  an  observed  and  a standard  flight 
path.  Horner,  et_  aj_.  (1970)  investigated  the  effects  of  differences 
between  video  taped  performances  on  instructor  pilot  ratings.  As 
student  or  instructor  pilots  performed  specified  UPT  maneuvers  in  the 
aircraft,  a video  camera  inside  the  cockpit  was  used  to  record  views 


of  ground  reference  points  and  of  the  instrument  panel.  A set  of  these 
taped  performances  was  selected  to  represent  a range  of  variations  from 
a video  tape  representing  the  standard  flight  path.  Instructor  pilots 
then  rated  these  taped  performances  on  a ten-point  scale  of  perform- 
ance quality.  The  ten-point  scales  were  an  expansion  of  the  regular 
UPT  four-point  rating  scale  and  the  instructors  used  these  expanded 
scales  to  rate  segments  of  each  maneuver.  Horner,  et_  a]_.  found  that 
variability  among  the  ratings  increased  with  the  extent  of  the  devia- 
tions between  the  rated  tapes  and  the  tape  of  the  standard  flight  path. 
That  is,  as  the  extent  of  the  deviations  between  tapes  increased,  the 
variability  in  ratings  increased.  Horner,  et_  aJL  concluded  that  there 
are  as  many  formulas  for  interpreting  deviations  as  there  are  instruc- 
tors . 

Knoop  ana  Welde  (1973)  had  instructors  and  students  perform  a 
series  of  different  UPT  maneuvers  in  an  aircraft.  Measures  of  the 
system  variables  were  recorded  throughout  each  flight  with  an  ADCS. 

At  the  completion  of  each  maneuver  during  the  flight,  both  the  per- 
former, a student  or  an  instructor,  and  the  accompanying  instructor 
rated  the  performance.  These  ratings  were  made  using  the  regular  UPT 
four-point  scale  and  each  rating  covered  an  entire  maneuver.  These 
subjective  ratings  were  correlated  with  a number  of  different  objec- 
tive measures  obtained  from  the  ADCS  recordings.  On  the  basis  of  these 
correlations,  Knoop  and  Welde  concluded  that  instructor  pilot  ratings 
lacked  standardization  and  that  comparisons  between  ratings  by  differ- 
ent instructors  would  be  unreliable. 
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Koonce  (197*0  used  different  methods  and  obtained  large  correla- 
tions ( r_  >_  .80)  between  pairs  of  raters.  Instrument  rated,  highly 
qualified  pilots  were  tested  with  a series  of  maneuvers  for  a commer- 
cial pilot's  instrument  rating.  Certified  flight  instructors  were 
used  in  pairs  to  observe  each  performance  and  they  recorded  data  using 
a paper  and  pencil  observation  schedule.  The  observers  were  briefed 
on  the  use  of  the  observation  schedule,  given  training  on  the  observa- 
tion task,  and  experience  in  performing  the  task  in  a flight  simulator. 
No  objective  criteria  were  used  in  training  the  observers;  when  an 
observer  reported  that  he  was  ready,  he  was  assigned  to  the  roster  of 
observers.  Each  observer  was  paired  with  every  other  observer  in  a 
counter-balanced  design  and  each  pair  collected  data  in  both  the  simu- 
lator and  the  aircraft. 

The  studies  reviewed  in  this  section  lead  to  two  conclusions. 
First,  instructor  pilots  must  be  trained  to  give  standard  evaluations 
of  variability.  Second,  criteria  must  be  developed  so  that  deviations 
between  a standard  flight  path  and  observed  data  can  be  consistently 
interpreted.  It  follows  that  standard  criteria  must  be  developed 
before  instructors  can  be  trained  to  give  standard  evaluations.  If 
comparisons  between  simulator  and  aircraft  training  are  to  involve 
the  use  of  instructor  ratings,  similar  criteria  must  be  used  in  each 
training  situation.  Practically,  advanced  simulation  systems  can  be 
used  to  develop  standard  criteria  and  to  train  the  instructor  pilots 


to  use  these  criteria. 
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Objective  Indicators 

One  problem  is  to  obtain  objective  indicators  of  variability  in 
pilot  performance  data  which  can  be  used  as  criteria  to  evaluate  preci- 
sion of  control.  In  studies  reviewed  on  this  problem,  the  indicators 
were  computed  from  objective  data  with  automated  systems.  To  obtain 
objective  data  from  pilot  performances  in  an  aircraft  or  in  a flight 
simulator,  special  electronic,  ADCS,  devices  are  connected  to  selected 
system  output  variables  (Hill  6 Goebel,  1971;  Knoop  & Welde,  1973; 
Shipley,  et_  aj_. , 197*0-  These  ADCS  devices  sample  and  compute  summary 
measures  or  record  data  points  from  each  output  variable  at  small, 
fixed  intervals  of  time,  e.g.,  one  discrete  data  point  per  variable 
per  second.  Data  obtained  with  these  devices  form  a time  series  and, 
for  the  purposes  of  evaluation  with  pilot  performances,  conventional 
indicators  of  variability  may  not  be  adequate  (Winer,  1971). 

A standard  deviation  is  a conventional  measure  of  precision  in 
a set  of  data.  Hill  and  Goebel  (1971)  found  some  weak  evidence  to 
support  the  use  of  the  standard  deviation  as  a measure  of  performance 
skill  in  pilot  training.  They  computed  standard  deviations  and  other 
measures  with  a small  on-line  computer  attached  to  a GAT-1  simulator. 
The  measures  were  computed  from  performance  variables  that  were 
assigned  constant  values  in  the  testing  tasks,  i.e.,  variables  with 
changing  values  were  not  sampled.  Methodologically,  266  different 
statistical  values  were  computed  and  tested  with  analysis  of  variance 
for  the  differences  among  three  performance  groups  at  the  .05  level. 
Forty  tests  were  significant  (.05)  and  13  of  these  were  standard  devia- 


tions. 
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There  is  reason  to  suspect  the  use  of  a sample  standard  devia- 
tion and  similar  estimators  of  precision  of  control  when  computed  across 
a set  of  time  series  data.  The  problem  arises  if  a sample  measure  of 
central  tendency  is  used  as  the  basis  of  the  measure  of  deviations.  If 
the  data  is  sampled  from  a nonlinear  variable  and  the  sample  mean  is 
used  as  the  measure  of  central  tendency,  the  measures  of  precision  will 
be  too  large,  thus  underestimating  the  real  level  of  skill.  A second 
case  of  inaccurate  estimation  occurs  if  all  sampled  observations  devi- 
ate in  the  same  direction  relative  to  an  assumed  standard,  e.g.,  all 
positive.  In  this  latter  case,  estimators  of  precision  calculated 
t'rom  the  sample  mean  will  be  too  small,  thus  overestimating  precision. 
Clearly,  estimates  of  precision  of  control  must  involve  a reference 
to  some  independent  standard. 

in  describing  relationsnips  between  training  objectives  and 

requirements  for  evaluation  in  pilot  training,  Brecke  and  Gerlach 

(1972)  consider  the  use  of  performance  limits: 

It  is  impossible  for  either  the  IP  or  the  SP  to  make 
more  than  a very  rough  subjective  evaluation  of  pilot 
performance  unless  some  objective  criterion  limits  are 
prescribed  which,  at  the  very  least,  permit  a distinc- 
tion between  acceptable  and  unacceptable  performance. 

Ideally,  however,  such  performance  or  criterion  limits 
should  clearly  and  objectively  define  ranges  of  per- 
formance in  accordance  with  the  grading  and  evaluation 
system  currently  in  use.  (p.  10) 

A performance  limit  is  a value  that  can  be  used  to  differentiate  between 
acceptable  and  unacceptable  performances  (Figure  5).  As  an  indicator 
of  acceptable  performance,  a performance  limit  expresses  the  maximum 
allowable  variability  from  the  standard.  Percent  time  on  criterion  is 
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one  indicator  of  central  tendency  used  with  performance  limits  (Fitts, 
Bahrick,  Briggs,  & Noble,  1959). 

Percent  time  indicators  should  not  be  used  exclusively  to  evalu- 
ate pilot  performances  (Fitts,  et_  a]_. , 1959).  Three  weaknesses  of 
dichotomous  performance  limit  indicators  have  been  identified:  (a) 

difficulty  in  specifying  the  size  of  the  limits;  (b)  sensitivity  of 
• limits  to  changes  in  variability;  and  (c)  need  to  evaluate  effects  of 
training  on  errors  in  performance  data.  There  are  difficulties  in 
specifying  the  size  of  the  performance  limits.  Horner,  et_  £l_.  (1970) 
and  Knoop  and  Welde  (1973)  have  shown  that  instructor  pilot  judgments 
are  not  based  on  reliable  standards  and  should  not  be  used  exclusively 
to  determine  the  size  of  performance  limits.  Knoop  and  Welde  recommend 
that  within  subject  sampling  procedures  be  used  as  the  basis  for  defin- 
ing objective  measures.  By  within  subject  sampling,  Knoop  and  Welde 
mean  that  each  pilot  performs  a series  of  trials  on  the  assigned  maneu- 
ver. With  such  performances,  Fitts,  et_  aj_.  (1959)  report  that  the 
standard  deviation,  measured  from  the  group  mean,  will  give  maximum 
discriminations  between  different  performances. 

Relative  to  the  standard,  estimates  of  learning  effects  will  be 
sensitive  to  differences  in  the  size  of  the  performance  limits  (Fitts, 
et_  aj_.  , 1959).  Early  in  training,  if  the  limits  are  too  small,  large 
changes  in  variability  will  not  be  detected.  Near  the  end  of  training, 
small  changes  in  variability  will  not  be  detected  if  the  limits  are  too 
large.  To  the  extent  that  changes  in  variability  reflect  improvements 
in  precision  of  control,  estimates  of  learning  based  exclusively  on 
percent  time  indicators  will  not  be  valid  in  pilot  training. 
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Potential  interactions  caused  by  changes  in  the  performance 
limits  are  seen  in  data  abstracted  from  Shipley,  et_  aj_.  (197*0  • Fitts, 
et  al  . (1959)  described  a procedure  for  estimating  percent  time  on  cri- 
terion from  the  normal  probability  distribution.  Shipley,  et_  aj_.  used 
these  procedures  to  obtain  percent  time  estimates  with  different  per- 
formance limits  from  the  same  set  of  data.  Evidence  for  potential 
interaction  effects  can  be  seen  in  these  results  (Table  1). 

In  a simple  dichotomous  evaluation  model,  there  is  no  way  to 
evaluate  the  effects  of  training  on  the  unacceptable  parts  of  a per- 
formance. The  concept  of  error  amplitude  can  be  used  to  evaluate  these 
data.  The  concept  of  error  amplitude  is  illustrated  in  Figure  6. 
Mathematically,  error  amplitude  is  expressed  as  a ratio  of  the  devia- 
tion between  an  error  observation  exceeding  the  nearer  performance 
limit  to  the  size  of  the  limit.  Statistically,  error  amplitude  is  sum- 
marized over  an  entire  performance  as  a root  mean  square  (RMS).  The 
mean  square  is  computed  from  the  total  number  of  observations  in  the 
data,  T,  or  from  the  number  of  deviations,  D.  For  the  data  from  the 
example  in  Figure  6,  the  error  amplitudes,  E(T)  and  E(D)  are:  T * 13; 

D = 4;  and 


E(T) 


/ 1 .I2  + 2.82  + 1 ,42  + (-0.42) 

V '3 


.93;  and, 


E(D) 


l2  + 2.82  + 1 .42  + (-0.42) 
5 


1 .67. 
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TABLE  1 


CHANGES  IN  TWO  STATISTICAL  MEASURES  CAUSED 
BY  DIFFERENT  CRITERION  LIMITS3 


Percent  Time  on  Heading:  Standard 

Deviations 

Criterion  Limits 

Performance 

Proportional 

Group 

1 Degree  (A) 

5 Degrees  (B) 

Increase  (B-A)/A 

1 

6.45 

21  .44 

2.32 

2 

5.17 

18.33 

2.55 

3 

5.29 

17.54 

2.32 

Percent  Time 

on  A 1 1 i tude : Means 

Criterion  Limits 

Performance 

Proportional 

Group 

50  Feet  (A) 

100  Feet  (B) 

Increase  (B-A)/A 

1 

61  .87 

79.98 

.29 

2 

40.85 

68.94 

.69 

3 

36.31 

52.06 

.43 

aData  taken 
1974. 

from  Table  3 (p.  33) 

and  Table  6 (p 

. 36),  Shipley,  et  al . , 

i 
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Conceptually,  the  measure  E(T)  indicates  the  average  rate  of  error  mag 
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nitude  over  all  observations  and  E (D ) indicates  the  average  size  of 
each  error.  Each  of  these  error  amplitude  indicators  is  an  estimate  of 
the  precision  of  control  in  a set  of  data. 

Shipley,  et^  al_.  (197*0  combined  error  amplitude,  E (T)  , with  an 
Indicator  of  time  on  criterion,  called  hit  rate,  to  evaluate  student 
pilot  performances  in  a training  experiment.  Hit  rate  is  the  number  of 
observations  within  the  performance  limits  divided  by  the  total  number 
of  observations.  Brecke  (1975)  and  Brecke,  Gerlach,  and  Shipley  (197*0 
evaluated  the  effects  of  preflight  instructions  on  student  performances 
In  a flight  simulator  with  hit  rate,  error  amplitude,  and  the  Fitts,  et_ 
al . (1959)  percent  time  on  criterion  indicators.  In  the  Brecke,  et  al . 
(197*0  study,  there  were  no  significant  effects  on  any  performance  vari- 
able with  any  of  these  indicators.  In  the  Brecke  (1975)  study,  both 
error  amplitude,  E(T)  , and  Fitts,  £t_  a]_.  percent  time  indicated  a sig- 
nificant interaction  between  type  of  instruction  and  performance  trials. 
Hit  rate  showed  no  effects  in  either  study.  The  source  of  the  differ- 
ence was  found  in  rate  of  improvement  in  precision  of  control  across 
trials.  Students  given  the  experimental  instructions  performed  con- 
si  stently  wel  1 across  all  learning  trials.  Those  given  regular  instruc- 
tions showed  definite  improvements  in  precision  of  control  on  the  first 
two  trials. 

Alternative  Indicators 

Performance  limit  indicators  are  effective  to  a limited  extent 
in  the  comparisons  of  group  performances  on  entire  trials.  These 
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summary  indicators  obliterate  details  and  become  increasingly  insensi- 
tive to  specific  deviations  as  the  number  of  observations  increases. 
Indicators  that  are  more  sensitive  to  differences  in  performance 
states  might  be  even  more  effective.  Maximum  deviation  and  performance 
time  are  two  indicators  that  can  be  used  in  place  of  those  based  on 
performance  limits. 

A review  of  indicators  used  in  previous  pilot  training  research 
produced  three  possibilities  which  might  be  used  in  a preliminary,  then 
a detailed  analysis:  the  range,  maximum  deviation  from  the  standard, 

and  performance  time.  Hagin  (19^7)  established  the  range  and  maximum 
deviation  as  objective  indicators  in  the  evaluation  of  instrument  fly- 
ing skills.  In  parallel  research  during  World  War  II,  performance  time 
was  also  studied  (Miller,  19^7).  Each  of  these  indicators  is  consider- 
ed in  the  context  of  the  present  research.  In  the  present  research, 
the  emphasis  is  on  the  use  of  comprehensive  data  collected  and  evalu- 
ated by  automated  devices.  In  the  previous  research,  instructor  and 
check  pilots  had  to  record  the  data  using  paper  and  pencil. 

Miller  (19^7)  summarized  the  results  of  measurement  research  in 
pilot  training  carried  out  in  the  World  War  II  military  studies.  Three 
methods  of  scoring  were  compared:  (a)  time  sampling  of  deviations  at 

specified  intervals;  (b)  the  range;  and  (c)  maximum  deviation.  As  com- 
pared to  instructor  ratings,  each  of  the  methods  was  equally  valid. 

Time  sampled  deviations  were  more  reliable  than  range  and  maximum 
deviation  on  trials  in  the  same  performance.  These  deviations  were 
generally  equivalent  to  the  range  and  maximum  deviation  on  test-retest 
reliabilities  across  days  and  observers.  However,  at  that  time  it  was 
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difficult  to  use  time  sampling  procedures  and  limited  evidence  suggested 
that  the  range  might  be  more  reliable  on  test-retest  than  the  maximum 
deviation. 

The  deviations  method  can  be  used  with  ADCS  devices  because 
these  devices  collect  data  with  fixed  interval  time  sampling  methods. 

If  the  objective  is  to  evaluate  variability  relative  to  a standard, 
deviations  are  preferable  to  the  range  on  logical  grounds.  Maximum 
deviation  will  be  a better  indicator  of  skill  in  maintaining  a standard 
than  the  range  under  certain  conditions.  These  conditions  occur  if  the 
two  values  defining  the  range  are  located  (a)  about  equally  distant  but 
in  opposite  directions  from  the  standard,  or  (b)  at  some  distance  in 
the  same  direction  from  the  standard.  For  two  values  located  in  oppo- 
site directions  from  the  standard,  th«.  range  may  be  as  much  as  two 
times  larger  than  the  maximum  deviation.  For  values  located  in  the 
same  direction  from  the  standard,  the  maximum  deviation  will  equal  or 
exceed  the  range  (see  Figure  3,  page  18). 

In  time  series  data  with  the  conditions  just  described,  maximum 
deviation  will  be  a better  indicator  of  precision  than  the  range 
because  maximum  and  minimum  observations  are  not  independent.  There 
is  no  possibility  of  finding  these  two  values  adjacent  to  each  other 
in  any  small  time  interval,  say  several  seconds.  In  such  data,  a 
maximum  deviation  and  its  direction  from  the  standard  is  more  informa- 
tive than  the  range  because  it  teils  us  that  deviations  in  some  related 
small  time  interval  will  be  similar  in  direction  and  magnitude.  There- 
fore, a maximum  deviation  and  its  time  of  occurrence  will  locate  a 
pattern  of  deviations  in  a performance;  the  range  will  not. 
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Performance  time  is  not  as  well  established  as  an  indicator  of 
performance  quality  in  pilot  training  as  either  the  range  or  maximum 
deviation.  In  the  World  War  II  studies,  performance  time  was  found 
to  be  a good  indicator  for  some  maneuvers  but  not  for  others  (Miller, 
19l*7).  As  a difficult  measure  for  pilots  to  observe,  it  was  dropped 
from  the  early  observation  schedules.  Some  recent  evidence  from  data 
collected  with  ADCS  in  the  aircraft  (Knoop  6 Welde,  1973)  and  in  a 
simulator  (Shipley,  et_  aK , 1974)  supports  a need  to  further  investi- 
gate performance  time. 

Performance  time  was  found  to  be  a good  indicator  in  two  recent 
studies.  In  a training  experiment,  Brecke,  et_  £l_.  (1974)  found  that 
differences  in  means  and  variances  for  performance  time  differentiated 
between  treatment  groups.  The  Brecke,  et_  a]_.  data  was  obtained  auto- 
matically from  student  pilot  performances  in  a flight  simulator  (Ship- 
ley,  et_  aj_. , 1974).  Knoop  and  Welde  (1973)  obtained  87  different 
objective  indicators  from  data  with  ADCS  methods  in  an  aircraft.  These 
data  included  performances  by  senior  instructor  pilots,  instructor 
pilots,  and  student  pilots.  The  87  objective  indicators  were  corre- 
lated with  two  different  ratings  of  each  performance,  one  by  the 
performer  and  one  by  the  instructor.  Of  the  87  different  indicators, 

7 or  8%  were  time  measures;  of  the  77  significant,  .05,  correlations 
obtained,  time  measures  accounted  for  11  or  14%. 

In  the  World  War  II  studies,  performance  time  was  found  to  be 
a significant  indicator  of  performance  quality  if  maneuver  objectives 
prescribed  starting  and  ending  points  and  the  form  of  the  desired 
flight  path  between  them,  e.g.,  a maximum  performance  turn  (Miller, 
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1947).  Other  maneuvers  and  maneuver  states  in  the  pilot  training  cur- 


riculum possess  similar  characteristics.  In  some  cases,  the  desired 
flight  path  is  not  clearly  specified  in  existing  materials,  but  it  can 
be  inferred  and  checked  empirically  (Brecke  & Gerlach,  1972;  Knoop  & 

Welde,  1973). 

To  the  extent  that  performance  time  can  be  empirically  validated 
In  a given  maneuver,  it  is  a potentially  powerful  indicator  of  perform- 
ance characteristics.  Performance  time  is  related  to  the  aircraft's 
flight  path  through  the  laws  of  motion  and  aerodynamics  (Etkin,  1972). 

It  is  logically  possible  for  an  observed  flight  path  to  meet  the 
requirement  of  a standard  time  and  still  contain  important  deviations 
from  the  standard  flight  path.  Such  conditions  will  occur  in  case  of 
large  compensating  errors  at  different  locations  in  a performance. 

However,  it  is  impossible  for  a performance  to  deviate  from  the  stand- 
ard time  and  also  be  on  all  criteria  throughout  the  entire  flight. 

A Performance  State  Evaluation  Model 

Performance  time  can  be  used  with  maximum  deviation  to  simplify 
performance  evaluations  for  selected  maneuvers.  To  accomplish  this 
objective,  one  designates  states  of  a performance  and  the  flight  path 
conditions  which  must  be  satisfied  at  the  endpoints  of  each  state 

I 

(Brecke  & Gerlach,  1972).  Time  is  a variable  in  the  set  defining  each 
state.  If  the  existing  materials  do  not  specify  desired  time  values, 
they  can  be  inferred  or  obtained  empirically  from  mastery  performances 
of  experienced  pilots.  Determining  the  level  of  experience  needed  for 
reliable  estimates  must  be  investigated. 


An  algorithmic  procedure  is  used  to  apply  performance  time  with 
existing  ADCS  data  (Figure  7).  If  total  time  fails,  some  state  or 
states  must  be  in  error.  The  state  or  states  in  error  can  be  located 
by  examining  performance  time  for  each  state.  For  a state  that  fails 
the  time  test,  if  the  entry  conditions  to  that  state  fail  the  devia- 
tions test,  at  least  some  source  of  the  error  will  be  in  the  preceding 
state  or  states.  If  the  entry  conditions  for  a state  are  satisfied, 
i.e.,  all  values  are  within  the  performance  limits,  the  source  of  the 
error  must  be  located  in  some  set  of  deviations  between  the  entry  point 
and  the  end  point  of  that  state.  If  the  end  point  conditions  fail,  the 
next  state  should  also  exhibit  some  deviations. 

Differences  in  means  and  variances  for  performance  time  were 
strongly  associated  with  similar  differences  in  maximum  altitude  for 
student  performances  on  the  Vertical  S-A  (Brecke,  et_  a]_. , 197A). 

Maximum  altitude  is  the  endpoint  between  two  adjacent  transition  states 
in  the  Vertical  S-A.  Gerlach,  Brecke,  Reiser,  and  Shipley  (1972)  pre- 
dicted that  students  would  have  their  greatest  difficulty  in  mastering 
the  Vertical  S-A  performance  in  these  two  transition  states.  Further 
analysis  of  these  data  revealed  that  the  statistical  differences  on 
these  measures  were  related  to  differences  in  specific  instructions 
about  pitch  control  (Brecke,  et_  aJN  , 197^). 

In  a subsequent  study,  Brecke  (1975)  found  significant  inter- 
actions on  error  amplitude  between  type  of  instruction  and  performance 
trial.  Percent  time  and  hit  rate  did  not  reveal  similar  differences. 
Comparisons  between  treatment  groups  and  a control  group  were  not 
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Figure  7. --Algor? thw  for  the  Analysis  of  Pilot  Performances 
by  Performance  States 


significant  on  any  of  the  three  summary  indicators.  Brecke  did  not 
carry  out  a performance  time  or  maximum  altitude  deviation  analysis. 
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Summary 

Because  of  the  many  sources  of  variability  in  the  flight  environ- 
ment, lack  of  precision  of  control  is  a problem  for  both  pilot  and 
evaluator  in  pilot-aircraft  system  performances.  Until  the  use  of 
automated  data  collection  systems  (ADCS),  little  could  be  done  to  objec- 
tively investigate  relationships  between  precision  of  control  and  per- 
formance evaluations.  Results  from  recent  studies  using  ADCS  procedures 
suggested  that  subject i ve ■ rat i ngs  and  conventional  summary  statistics 
are  not  adequate  indicators  of  precision  of  control  because  of  variabil- 
ities in  pilot  performances.  Summary  indicators  based  on  performance 
limit  metnods  are  better  sources  of  evaluation,  but  these  indicators 
must  be  used  with  care  because  of  potential  ceiling  effects.  Among 
the  alternative  indicators,  performance  time  and  maximum  deviation 
appeared  to  meet  the  requisite  criteria,  but  more  empirical  evidence 
was  needed  about  the  use  of  these  measures. 

To  obtain  empirical  evidence  about  the  use  of  performance  time 
and  maximum  deviation,  the  concept  of  performance  states  from  control 
theory  was  used  to  modify  an  existing  evaluation  model.  In  this  modi- 
fied evaluation  model,  performance  time  is  an  indicator  of  need  for  more 
detailed  analyses.  Maximum  deviations  indicate  the  sources  of  perform- 
ance errors  in  states  that  fail  time  tests.  A state  by  state  analysis 
is  carried  out  with  procedures  in  the  form  of  an  algorithm. 
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CHAPTER  I I I 


METHODS 

Three  empirical  investigations  were  carried  out  with  objective’ 
data  to  test  a performance  state  evaluation  model.  In  thij  model,  per 
formance  times  from  objective  data  are  used  as  preliminary  indicators 
of  performance  quality  and  deviations  from  standard  values  are  used  to 
identify  specific  errors.  The  objective  data  were  obtained  from  pilot 
performances  in  a flight  simulator  with  ADCS  (Shipley,  Gerlach,  & 
Brecke,  197*0.  Data  from  performances  by  2 experienced  and  39  student 
pilots  on  an  instrument  flight  maneuver  were  used  in  these  investiga- 
tions. In  the  first  investigation,  it  was  hypothesized  that  there 
would  be  no  significant  differences  between  the  performances  of  two 
experienced  pilots  on  performance  times  compared  with  each  other  and 
with  standard  time  values.  In  the  second  investigation,  total  per- 
formance times  were  analyzed  with  analysis  of  variance  using  a 2 x 2 
mixed  design  and  Dunnett's  procedures  were  used  to  compare  each  treat- 
ment group  to  an  external  control  group.  Maximum  altitude  variances 
were  tested  for  differences  using  F_-ratios.  Data  for  the  second 
investigation  were  taken  from  an  experimental  study  by  Brecke  (1975) - 

The  first  two  investigations  were  carried  out  to  establish  a 
basis  to  design  the  third  and  final  investigation.  In  the  final  inves 
tigation,  the  hypothesis  was  that  performance  times,  maximum  altitude, 
and  personal  data  about  each  subject  would  predict  error  amplitude 
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scores.  The  error  amplitude  scores  from  the  Brecke  (1975)  study  were 
used  as  the  criterion  variable  in  a stepwise  regression  analysis.  The 
variates  in  the  regression  analysis  were  transformed  time  and  maximum 
altitude  values  and  personal  data  for  each  subject.  In  this  chapter, 
the  general  methods  used  in  the  three  investigations  are  described, 
the  first  two  investigations  and  their  results  are  reported,  and  the 
design  and  procedures  used  in  the  third  investigation  are  presented. 

A Standard  Flight  Path 

A Test  Performance 

An  instrument  flight  maneuver,  the  Vertical  S-A,  was  used  as  the 
basis  of  the  performances  examined  in  the  three  investigations.  Brecke 
and  Gerlach  (1972)  analyzed  materials  from  the  Air  Force  Undergraduate 
Pilot  Training  syllabus  dealing  with  the  Vertical  S-A.  They  developed 
a comprehensive  set  of  values  to  represent  the  flight  path  for  the  pur- 
poses of  instructional  development  (Brecke,  1975;  Brecke,  Gerlach,  & 
Shipley,  197**;  Gerlach,  Brecke,  Reiser,  5 Shipley,  1972).  Shipley, 
Gerlach,  and  Brecke  (197**)  also  used  these  values  and  training  objec- 
tives to  develop  ADCS  methods  to  evaluate  performances  of  student 
pilots  participating  in  training  experiments.  In  the  present  inves- 
tigations, the  standard  Vertical  S-A  flight  path  consisted  of  a sequence 
of  seven  states:  six  extended  states  and  one  momentary  state.  The 

seven  states  in  the  standard  Vertical  S-A  flight  path  are: 

1.  the  transition  from  level  flight  to  climbing  flight; 

2.  cl imbing  f 1 ight ; 

3.  the  transition  from  climbing  flight  to  maximum  altitude; 
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4.  maximum  altitude; 

5.  the  transition  from  maximum  altitude  to  descending  flight; 

6.  descending  flight;  and 

7.  the  transition  from  descending  flight  to  level  flight. 
(These  states  are  illustrated  on  the  standard  Vertical  S-A  altitude 
curve  in  Figure  2,  page  12.) 


Expected  Times 

The  flight  path  for  the  Vertical  S-A,  as  described  in  the  train- 
ing objectives,  must  be  symmetric  about  the  maximum  altitude.  That  is, 
the  values  for  States  1 to  3 are  identical  to  the  values  for  States  5 
to  7 except  for  the  difference  in  direction:  climb  versus  descent. 

The  principle  of  symmetry  was  used  to  derive  the  time  to  perform  each 
state  in  the  standard  flight  path.  Each  state  covers  a specified  change 
of  altitude;  for  example,  800  feet  each  in  States  2 and  6.  The  rate  of 
vertical  change  is  also  specified  at  1,000  feet  per  minute  for  these  two 
states.  It  follows  that  the  time  to  perform  these  2 states  is  48  sec- 
onds each  (800  feet  divided  by  16.67  feet  per  second). 

In  the  transitions,  empirical  estimates  of  times  were  necessary 
because  the  vertical  rate  is  changing.  Estimates  of  the  time  to  perform 
the  transition  from  climb  to  descent,  States  3 to  5,  were  obtained  from 
observations  of  an  experienced  pilot's  performance.  An  instructor  pilot 
from  the  Air  Force  Human  Resources  Laboratory,  Flight  Training  Division, 
at  Williams  Air  Force  Base  performed  a series  of  ten  Vertical  S-A  maneu- 
vers in  a flight  simulator.  The  average  performance  time  from  15,900 


to  16,000  and  back  to  15,900  feet  was  16.38  seconds  with  a standard 
deviation  of  1.78  seconds. 
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Since  the  maneuver  is  symmetrical,  it  was  assumed  that  each 
transition  state  would  require  an  equal  performance  time.  The  training 
objectives  specify  that  each  transition  covers  100  feet  of  altitude 
change  and  that  each  has  similar  starting  or  ending  rates  of  vertical 
change:  to  or  from  0 feet  per  second  and  to  or  from  plus  or  minus 
16.67  feet  per  second.  An  estimate  of  about  8 seconds  (16.38  seconds 
divided  by  2)  was  obtained  as  the  standard  time  for  each  of  the  transi- 
tion states.  The  total  time  for  the  standard  Vertical  S-A  flight  path 
used  in  these  studies  was  129  seconds:  96  seconds  for  the  2 steady 

states,  32  seconds  for  the  4 transition  states,  and  1 second  for  maxi- 
mum altitude.  The  standard  time,  altitude,  and  average  vertical  rate 
values  for  each  state  are  summarized  in  Table  2. 


Data  Collection  Procedures 


i»2 


TABLE  2 

STANDARD  TIMES,  ALTITUDE  CHANGES,  AND  VERTICAL  RATES 
FOR  EACH  PERFORMANCE  STATE  IN  THE  VERTICAL  S-A 


State 

Time 

Altitude  Change 

Average  Vertical  Rate 

1 

8.00  sec 

+100  ft 

+12.50  ft/sec 

2 

A8.00  sec 

+800  ft 

+16.67  ft/sec 

3 

8.00  sec 

+100  ft 

+12.50  ft/sec 

4 

1 .00  sec 

0 ft 

0 ft/sec 

5 

8.00  sec 

-100  ft 

-12.50  ft/sec 

6 

A8.00  sec 

-800  ft 

-16.67  ft/sec 

7 

8.00  see 

-100  ft 

-1 2.50  f t/sec 

TOTAL 


129.00  sec 


2000  ft 


15.50  ft/sec 


Total  time  (TOTTIM)  and  maximum  altitude  time  were  defined  inclusively,  e.g.,  TOTTIM 
all  other  performance  state  times  were  simple  differences,  e.g.,  T_  - T, . 
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The  conditions  of  performance  and  the  characteristics  of  the  ADCS 
data  did  not  always  permit  altitude  observations  that  were  precisely 
15,100  or  15,900  feet.  In  such  cases,  the  nearest  altitude  values  to 
these  criterion  points  were  used.  In  determining  the  starting  and  end- 
ing points,  altitude  values  were  used  as  the  primary  indicator.  In  all 
cases  the  altitude  value  of  15,000  feet  or  the  value  nearest  to  15,000 
feet  was  used  to  indicate  the  start  and  end  of  the  trial.  For  all  234 
trials  in  the  Brecke  (1975)  study,  the  mean  (M)  and  standard  deviation 
(SD)  for  starting  and  ending  altitudes  were  (a)  starting:  M » 1 4 ,980 

feet,  SD  = 206  feet;  and  (b)  ending:  M = 14,940  feet,  SD  * 320  feet. 

[• 

Some  cases  were  identified  where  pitch  and  power  values  indicated 
that  a trial  had  started  or  ended  at  an  altitude  other  than  15,000  feet. 

The  typical  range  for  starting  altitude  was  from  14,800  to  15,200  feet. 

In  cases  of  ADCS  recording  ma ) functions , starting  or  ending  altitudes 
were  observed  at  15,700  feet,  but  adjacent  altitude  values  were  also 
observed  with  differences  as  great  as  600  feet.  In  such  cases,  the 
pattern  of  altitude  values,  and  values  from  power  and  pitch,  were  used 
to  identify  the  point  in  question  and  to  derive  its  actual  value. 

Preliminary  Investigation  I 

Investigation  I was  carried  out  to  compare  the  standard  time 
values  in  Table  2 with  the  performance  times  of  two  experienced  pilots 
and  to  discover  whether  or  not  the  performance  state  evaluation  model 
would  reveal  any  differences  between  the  performances  of  these  two  pilots. 

If  the  performance  state  evaluation  model  detected  differences  between 

performance  times  for  experienced  individual  pilots,  it  would  also 

j 
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discriminate  between  performances  of  student  pilots.  Discriminations 
between  performances  and  standard  values  and  between  different  perform- 
ances were  considered  essential  to  effective  evaluations  in  regular 
training  and  in  experimental  research  on  the  effects  of  training 
methods , 

Method 

Two  experienced  pilots,  one  a researcher  and  the  other  an  instruc- 
tor pilot,  each  performed  a sequence  of  six  trials  on  the  Vertical  S-A 
maneuver  in  a flight  simulator  according  to  procedures  described  by 
Brecke  (1975).  Originally,  these  performances  were  obtained  as  a part 
of  the  tryout  procedures  of  the  ADCS  device.  Trials  1 and  6 of  the 
instructor  pilot's  performances  were  not  included  in  the  present  inves- 
tigations. Trial  1 was  a descending  rather  than  a climbing  Vertical 
S-A;  Trial  6 was  unusable  because  of  an  ADCS  malfunction. 

Procedures 

Performance  time  and  maximum  altitude  data  were  obtained  from 
computer  printouts  of  the  ten  remaining  performances  (see  Table  3). 

Means  and  standard  deviations  for  total  time,  time  for  each  performance 
state,  and  maximum  altitude  were  computed  for  each  pilot's  data.  The 
means  were  tested  with  t_-tests  at  the  .05  level.  Absolute  values  were 
obtained  for  deviations  of  observations  from  corresponding  standard 
values  on  each  of  the  indicators  named  above.  Standard  deviations,  o, 
were  computed  from  these  absolute  deviations  for  each  indicator  to  use 
in  subsequent  research.  Tests  were  also  carried  out  with  these  devia- 


tion values  using  normal  deviate,  z_-score,  methods  to  identify 
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performances  or  performance  states  that  were  significantly  different 
from  the  standard  values  at  the  .05  level.  Each  statistical  test  was 
made  using  two-tailed  values  and  the  tests  on  the  standard  values  were 
made  using  confidence  limit  procedures  (Hays,  1963). 


Resu 1 ts 

The  means,  standard  deviations,  and  95%  confidence  limits  for 
each  indicator  are  given  in  Table  4.  Three  significant  differences 
were  found  in  the  sample  values.  These  significant  differences  were 
the  means  for  total  times:  t_ « 4.02,  df  * 8;  the  means  for  State  2 

times:  _t  ■ 2.64,  df  ■ 8;  and  the  variances  for  State  5 times:  not 

homogeneous,  F = 20.95  on  1 and  3 degrees  of  freedom  (significant 
at  or  beyond  the  .05  level  using  Hartley's  test  [Myers,  1966]). 

Among  the  total  time  deviations,  one  performance  by  the  research- 
er pilot  was  found  to  be  significantly  different.  The  total  time  devia- 
tion for  his  third  performance  was  24  seconds  less  than  expected. 

Table  4 includes  a list  of  35%  confidence  limits  for  deviations  from 
standard  times.  On  the  basis  of  these  limits,  one  performance  state 
by  the  researcher  pilot  was  also  identified  as  significantly  deviant. 

In  performance  one  the  time  for  State  6 was  15  seconds  less  than  expected. 
No  performance  times  of  the  instructor  pilot  were  significantly  deviant. 

There  were  no  significant  differences  between  maximum  altitudes 
for  these  two  experienced  pilots.  Means  and  standard  deviations  for 
maximum  altitude  were:  (a)  for  the  researcher  pilot:  M = 15,987.78 

feet,  SD  = 27.37  feet;  and  (b)  for  the  instructor  pilot:  M = 15,995.36 

feet,  SO  - 27.27  feet. 


These  limits  are  for  use  with  deviations  of  an  observed  from  a standard  time  in  seconds 
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Discussion 

It  was  possible  to  discriminate  between  the  average  performances 
of  two  experienced  pilots  on  the  basis  of  their  performance  times.  Like- 
wise, it  was  possible  to  locate  significant  deviations  in  their  perform- 
ance states  by  attending  to  deviations  from  the  standard  times.  Accept- 
able total  performance  times  must  be  interpreted  carefully  because  pat- 
terns of  too  little  and  too  much  time  in  the  performance  states  will 
cancel  each  other.  For  example,  in  the  first  performance  of  the 
researcher  pilot,  a total  time  deviation  of  -9  seconds  was  not  signifi- 
cant, although  the  deviation  of  -15  seconds  for  State  6 was  significant. 
In  this  performance,  accumulated  deviation  time  for  performance  States 
1 to  3 was  +11  seconds,  while  the  accumulated  deviation  time  for  per- 
formance States  5 and  6 was  -20  seconds.  On  the  other  hand,  a signifi- 
cant total  deviation  time  may  be  accumulated  over  a series  of  perform- 
ance states.  In  performance  three  by  the  researcher  pilot,  the  total 
deviation,  -2k  seconds,  was  accumulated  as  -28  seconds  across  States  1, 

2,  5,  6,  and  7 although  none  of  the  deviations  for  an  individual  state 
was  significant. 

The  statistical  tests  carried  out  in  the  present  investigation 
were  based  on  an  assumption  that  performance  times  were  normally  distri- 
buted about  the  standard  value  of  129  seconds.  That  assumption  is  ques- 
tionable for  the  data  in  the  ten  performances  examined  in  this  investi- 
gation. The  performances  of  the  instructor  pilot  ranged  from  124-135 
seconds;  the  mean  time  of  his  four  trials,  129.5,  was  equivalent  to  the 
standard  total  time  of  129  seconds.  However,  for  the  researcher  pilot 
all  the  performance  times  ranged  from  105-122  seconds,  and  his  mean  total 
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time,  115.15,  was  significantly  less  ( t_  = -5.18,  df  = 5,  £. < .01)  than 
the  standard  time  of  129  seconds,  A definitive  test  of  the  assumption 
of  normally  distributed  time  values  cannot  be  made,  however,  without 
more  data  from  performances  of  experienced  pilots.  The  standard  times 
from  Table  2 and  the  estimated  population  standard  deviations  (a)  from 
Table  **  were  used  in  the  main  investigation  in  this  study. 

Preliminary  Investigation  II 

Investigation  II  was  carried  out  to  determine  whether  total  per- 
formance time  or  maximum  altitude  would  discriminate  between  performances 
of  treatment  groups  in  a training  experiment.  An  a priori  prediction  was 
that  differences  among  maximum  altitude  variances  would  discriminate 
among  treatment  group  performances.  This  prediction  was  based  on  a per- 
formance analysis  (Gerlacn,  Brecke,  Reiser,  6 Shipley,  1972)  and  on  the 
outcomes  of  a prior  experiment  (Brecke,  Gerlach,  & Shipley,  197*0  . The 
values  used  in  the  present  investigation  were  obtained  from  performance 
data  produced  by  Brecke  (1975)  in  an  extension  of  the  197**  experimental 
research . 

Method 

Thirty-nine  student  pilots  were  randomly  assigned  to  one  of  five 
groups.  Members  of  the  four  experimental  groups  studied  objectives  and 
different  preflight  instructions  on  how  to  perform  the  Vertical  S-A; 
members  of  a control  group  studied  only  the  objectives.  After  each  sub- 
ject had  studied  his  assigned  materials,  he  performed  a sequence  of  six 


(1975).  Data  on  each  subject's  performances  in  the  flight  simulator 
was  collected  using  ADCS. 

Procedures 

Total  performance  time  and  maximum  altitude  values  were  obtained 
from  the  Brecke  data  using  procedures  described  in  Table  3. 

Performance  time.  The  total  time  values  were  analyzed  using 
analysis  of  variance  with  a two  between-  one  wi thin-subjects  design; 
trials  was  the  within  subjects  variable.  Dunnett's  method  (Myers,  1966) 
was  used  to  compare  the  means  of  the  treatment  groups  to  the  means  of 
the  control  group.  The  statistical  hypotheses  of  no  significant  main 
effects  or  interactions  and  no  significant  contrasts  between  treatment 
and  control  groups  were  tested. 

Maximum  al ti tude.  On  the  basis  of  a performance  analysis,  dif- 
ferences in  performances  were  to  be  expected  at  the  maximum  altitude 
(Gerlach,  et  a 1 . , 1972).  In  previous  research,  differences  in 
treatment  group  variances  (heterogeneity  of  variance)  were  found  to  be 
significant  indicators  of  treatment  effects  (Brecke,  et_  al_. , 197*0. 

Tests  of  differences  in  variances  were  used  in  this  investigation 
rather  than  tests  for  differences  in  means,  i.e.,  ANOVA  or  t_-tests. 

The  hypotheses  were  based  on  a priori  predictions  of  expected  differ- 
ences in  the  variabilities  of  performances  among  the  treatment  groups. 

In  the  present  investigation,  it  was  hypothesized  that  there 
would  be  significant  differences,  .05,  among  variances  on  maximum  alti- 
tude given  type  of  experimental  treatment.  There  were  five  treatment 
groups  (a  2 x 2 design  with  control  group)  in  the  Brecke  (1975)  study. 
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It  was  predicted  that  the  variances  for  two  groups  (A ^ B ^ and  A^B^) 
receiving  experimental  instruction  would  be  significantly  smaller, 
.05,  than  variances  for  the  control  group  (C)  or  variances  for  two 
groups  (A  B and  A B ) receiving  current  instruction.  A one-tailed 

^ I ^ it, 

test  was  used  with  this  directional  hypothesis  and  two-tailed  tests 
were  used  to  test  the  hypothesis  of  no  other  significant  differences 
among  the  remaining  group  variances.  The  critical  F_  values  used  to 
carry  out  these  tests  are  given  in  Table  5- 


Resul ts 


Total  time.  In  the  total  time  ANOVA,  there  were  significant 
main  effects  on  type  of  instruction,  number  of  practice  items  in  the 
instructional  program,  and  performance  trials.  A significant  inter- 
action was  also  observed  between  levels  of  practice  items  and  perform- 
ance trials.  The  ANOVA  summary  is  given  in  Table  6 and  the  means  for 
the  significant  interaction  and  trials  are  given  in  Table  7.  The  means 
for  the  effects  due  to  type  of  instruction  were:  (a)  experimental, 

111.67,  and  (b)  current,  131.16.  For  the  effects  due  to  levels  of 
practice,  the  means  were:  (a)  low,  128.10,  and  (b)  high,  114.72. 

There  were  no  significant  contrasts  on  Dunnett's  test  between  any 
treatment  group  and  the  control  group. 

Maximum  al t? tude.  The  comparisons  among  the  treatment  group 
variances  were  significant  as  predicted  (Table  8).  At  maximum  altitude, 
variances  for  the  two  groups  receiving  experimental  instructions  were 
significantly  (.05)  smaller  than  the  variance  for  the  control  group  and 
than  variances  for  the  two  groups  receiving  current  instruction.  The 
observed  F-ratios  in  these  contrasts  range  from  4.06  to  10.39.  The 
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TABLE  5 

CRITICAL  F-VALUES  FOR  TESTS  OF  PREDICTED  DIFFERENCES 

IN  VARIANCES 


Experimental 

Vi  V2 

Treatment 

Control 

C 

Current 

A23l 

A2B2 

1.88a 

1.69b 

l.69b 

1.69b 

— 

1.69b 

1 .69b 

1 .69b 

— 

1.88a 

1.88a 

-- 

1.88a 

aTwo-tailed  tests  at  .05. 


b0ne-tailed  tests  at  .05. 

NOTE:  There  were  either  42  (Group  C)  or  48  observations  in  each  vari- 

ance. The  critical  values  in  this  table  are  based  on  40  degrees 
of  freedom  for  each  variance.  These  degrees  of  freedom  lead  to 
slightly  conservative  tests.  The  two-tailed  values  are  taken 
from  tables  at  .025. 
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TABLE  6 

ANALYSIS  OF  VARIANCE:  TOTAL  TIME 


Source 

SS_ 

df_ 

MS 

F 

£ 

Total 

157872.50 

191 

826.56 

Between 

73560.66 

31 

2372.92 

Instruction  (A) 

18232.51 

1 

18232.51 

10.95 

.005 

Practice  Items  (B) 

8600.13 

1 

8600.13 

5.16 

.05 

A x B 

49.01 

1 

49.01 

.03 

n .s . 

Error 

46679 . 02 

28 

1667.11 

Within 

8431 1 .84 

160 

526.95 

Trials  (T) 

9314.71 

5 

1862.94 

4.20 

.005 

T x A 

3564.33 

5 

712.87 

1 .61 

n .s . 

T x B 

5151.34 

5 

1030.27 

2.32 

.05 

T x A x B 

4143.83 

5 

828.77 

1 .87 

n .s . 

Error 

62137.62 

140 

443.84 

k 
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TABLE  7 


MEANS  OF  SIGNIFICANT  EFFECTS  OF  TRIALS  AND  PRACTICE  BY  TRIALS 
INTERACTION  ON  PERFORMANCE  TIME3 


Pract i ce 
1 terns 

Trials 

One 

Two 

Three 

Four 

Five 

Six 

Low 

146.69 

115.15 

132.44 

121 .56 

122.63 

130.13 

High 

124.31 

115.56 

112.63 

122.75 

107.06 

106.00 

Trials 

135.50 

115.38 

122.53 

122.16 

114.84 

118.06 

aln  this  table,  any  contrast  greater  than  18.43  seconds  is  significant 
at  or  beyond  .05  (Scheffe  multiple  comparison  limits  [Myers,  1966]). 
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OBSERVED  COMPARISONS  OF  VARIANCES  ON 
VERTICAL  S-A  MAXIMUM  ALTITUDE 

Treatment 


Exper i menta  1 

AjB2  AjBj 

Control 

C 

Current 

A2B1  A2B2 

Var i ances 

5046.68  7032.68 

28554.24 

37419.03 

52418.10 

Compar i sons3 

1.00  1.37 

5.66* 

7.41* 

10.39* 

1 .00 

4.  06* 

5.32* 

7.45* 

1 .00 

1.31 

1.84 

1 .00 

1 .40 

al .00  as  the 
column  was 

leading  value  in  a row 
used  as  the  denominator 

indicates  the  variance  in 
in  the  F^ratio. 

that 

*F-ratios  significant  at  or  beyond  the  predicted  .05  values;  each  of 
these  F-ratios  would  have  been  significant  tested  at  the  .001  level. 
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control  group  was  not  significantly  less  variable  in  performance  at 
maximum  altitude  than  the  groups  receiving  current  instruction. 

Further  analyses  of  variances  for  trials  and  subjects  provide 
additional  evidence  that  differences  in  variability  may  be  important 
indicators  of  differences  of  quality  among  pilot  performances.  Cochran's 
test,  C_,  is  commonly  used  to  test  for  heterogeneity  of  variance  prior  to 
an  ANOVA  (Myers,  1966;  Winer,  1971).  As  a test  value,  C_  is  defined  as 
the  ratio  of  the  largest  variance  to  the  sum  of  all  the  variances  in 
a set.  Conceptually,  C_  represents  the  percent  of  variance  accounted 
for  by  the  largest  variance  in  a set  of  variances. 

To  illustrate,  for  the  variances  tested  in  Table  8,  the  variance 
for  group  A2B2,  52418.10,  accounts  for  40%  of  the  total  variance.  On 
the  other  hand,  the  combined  variances  (12079-36)  for  the  groups  receiv- 
ing experimental  instructions  (A | B ^ and  AjB^)  account  for  only  S%  of  the 
total.  Among  the  possible  tests  of  £ for  all  trials  and  all  subjects, 
single  variances  were  identified  that  accounted  for  as  much  as  90 % of 
the  total  variance  in  some  groups.  For  example,  among  all  possible 
comparisons  of  variances  for  each  subject's  performances,  both  the 
largest  (361802.25)  and  the  next  to  smallest  (231.34)  were  located  in 
one  treatment  group,  A2B|.  Among  this  set  of  all  student  pilot  vari- 
ances, the  smallest  was  187.42;  compared  to  the  variances  of  the  two 
experienced  pilots,  749-12  and  743.65,  this  smallest  student  pilot  vari- 
ance is  not  significant.  A variance  of  5228.86  or  larger  is  required 
to  be  significant,  .05,  when  compared  to  the  experienced  pilots'  vari- 


ances . 
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D i scuss ion 

Performance  time  and  maximum  altitude  enable  one  to  discriminate 
between  performances  of  individuals  and  experimental  treatments.  To 
the  extent  that  precision  of  pitch  and  power  control  in  the  Vertical 
S-A  is  reflected  in  variations  of  performance  at  maximum  altitude, 
these  findings  provide  strong  support  for  the  hypothesis  that  specific 
indicators  should  be  considered.  Brecke  (1975)  found  that  the  groups 
receiving  formal  instructions,  experimental  or  current,  required  about 
twice  as  much  study  time  as  the  control  group.  He  also  failed  to  find 
significant  differences  between  control  group  and  treatment  group  per- 
formances on  his  dependent  variables.  On  this  evidence,  he  questioned 
the  time  and  effort  required  to  prepare  experimental  instruction.  In 
the  present  investigation,  performances  of  groups  receiving  the  experi- 
mental instructions  were  about  four  co  five  times  less  variable  than 
those  of  the  control  group  and  about  seven  times  less  variable  than 
those  of  groups  receiving  current  instruction.  On  the  basis  of  this 
evidence,  it  is  conceivable  that  a reduction  of  four  or  more  times  in 
variability  of  performance  early  in  training  is  a highly  desirable 
outcome . 

Design  of  the  Main  Investigation 

In  the  second  preliminary  investigation,  it  was  shown  that 
analyses  of  specific  indicators,  i.e.,  variance  at  maximum  altitude  in 
the  Vertical  S-A,  would  lead  to  different  results  from  analyses  carried 
out  on  summary  indicators  like  hit  rate,  percent  time  on  criterion,  and 
error  amplitude.  Two  alternatives  were  available  in  the  design  of  this 
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third  investigation:  (a)  to  identify  particular  locations  of  differ- 


ences in  the  student  pilot  performances  of  Brecke's  1975  study,  or 
(b)  to  examine  relationships  between  summary  and  specific  indicators 
empirically.  To  identify  particular  locations  of  differences,  methods 
described  in  the  performance  state  evaluation  model  would  have  been  used. 
The  first  alternative  was  rejected  because  the  utility  of  the  perform- 
ance state  model  had  been  demons t rated  in  the  first  preliminary  inves- 
tigation . 

The  main  investigation  was  designed  to  examine  relationships 
between  summary  and  specific  indicators.  The  hypothesis  was  that  a 
set  of  selected  specific  indicators  would  predict  values  of  a summary 
indicator.  If  summary  indicator  scores  from  Brecke  (1975)  data  could 
be  predicted  from  the  selected  specific  indicators,  the  results  of  the 
main  investigation  wouid  support  the  replacement  of  summary  indicators 
with  selected  simple  indicators.  A multiple  regression  analysis  was 
designed  to  test  this  hypothesis. 

Method 

Design . Because  error  amplitude,  E(T),  is  theoretically  an  indi- 
cator of  variability  in  pilot  performances,  it  was  selected  as  the  cri- 
terion variable  in  the  multiple  regression  analysis.  Brecke  (1975) 
used  ANOVA  to  analyze  scores  on  error  amplitude.  The  results  of  his 
analysis  are  given  in  Table  9 (Brecke,  1975,  p.  6*0.  In  the  present 
study,  Brecke's  EpO  scores  were  used  as  the  criterion.  The  presence 
of  significant  effects  in  his  ANOVA  (Table  9)  and  the  results  of  the 
second  preliminary  investigation  in  the  present  study  led  to  a decision 
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TABLE  9 


ANALYSIS 

OF  VARIANCE: 

ERROR 

AMPLITUDE 

E 3 

Source 

SS 

df 

MS 

F 

E 

Total 

2436.03 

191 

12.75 

Between 

1112.66 

31 

35.89 

Instruct,  cues  (A) 

115.20 

1 

115.20 

3.38 

.0735 

Practice  (B) 

21  .41 

1 

21 .41 

.63 

.5593 

A x B 

21  .26 

1 

21 .26 

.62 

.5577 

Error 

954.79 

28 

34.10 

Within 

1323.37 

160 

8.27 

Trials  (0) 

119-52 

5 

23.90 

3-14 

.0104 

D x A 

86.30 

5 

17.26 

2.26 

.0508 

D x B 

13.09 

5 

2.62 

.34 

.8856 

D x A x B 

37.07 

5 

7.41 

.97 

.5619 

Error 

1067.39 

140 

7.62 

"Ej."  signifies  that  error  amplitude  scores,  E(T),  were  summed  across 
five  performance  variables:  airspeed,  heading,  vertical  rate,  pitch, 
and  power  (Brecke,  1975,  p.  6). 
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to  carry  out  a regression  analysis  on  each  performance  trial  with  corre- 
lations averaged  across  treatment  groups. 

Performance  state  times,  total  time,  maximum  altitude,  and  mea- 
sures of  prior  experience  were  used  as  the  predictors  or  variates. 

Three  combinations  of  performance  state  times  were  also  included  as 
predictors  in  the  design.  These  combinations  were  the  sums  of  perform- 
ance state  times  1 to  3 , 5 tc  7,  and  1 to  7.  The  prior  experience  vari- 
ates were  total  hours  as  a pilot,  flying  time  in  training,  total  hours 
in  the  T-4  simulator,  and  number  of  minutes  to  complete  an  instructional 
program  prior  to  performance  of  the  Vertical  S-A  in  a T-4G  flight  simu- 
lator. The  regression  design  is  summarized  and  the  labels  used  for  the 
variables  are  given  in  Table  10. 

Data . Data  used  in  this  investigation  was  obtained  from  a pre- 
vious study  by  Brecke  (1975).  In  that  study,  33  student  pilots  in  uPT 
studied  one  of  five  sets  of  materials  prior  to  performing  a sequence 
of  six  trials  on  the  Vertical  S-A.  In  addition  to  data  collected  on 
six  simulator  performance  variables  and  time,  Brecke  obtained  data  on 
the  time  taken  by  each  subject  to  study  the  training  materials  and  data 
on  the  prior  experience  of  each  student  pilot.  He  obtained  E(T)  scores 
from  the  data  collected  with  ADCS  during  each  performance  using  methods 
developed  by  Shipley,  et_  aj_.  (1974).  The  E(T)  scores  used  in  Brecke's 
ANOVA  were  summed  across  five  of  the  six  simulator  performance  variables: 
airspeed,  heading,  vertical  rate,  pitch,  and  power. ^ 

2 

The  Brecke  analysis  did  not  include  altitude  data  because  ADCS 
malfunctions  during  the  collection  process  made  it  impossible  to  use  a 
computer  to  compute  scores  from  the  altitude  data. 
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TABLE  10 

SUMMARY  OF  VARIABLES  IN  DESIGN  OF  REGRESSION  ANALYSIS 


Variable 

Number 

Var i abl e 
Type 

Variable  Name 

Mnemonic 

1 

Cri ter  ion 

Error 

Amplitude,  Summed 

Error 

2 

Performance 

Tota  1 

Time,  transformed 

TOTTIM 

3 

Performance 

State 

One  Time,  transformed 

Time  1 

4 

Performance 

State 

Two  Time,  transformed 

Time  2 

5 

Performance 

State 

Three  Time,  transformed 

Time  3 

6 

Performance 

State 

Four  Time,  transformed 

Time  4 

7 

Performance 

State 

Five  Time,  transformed 

Time  5 

8 

Performance 

State 

Six  Time,  transformed 

Time  6 

9 

Performance 

State 

Seven  Time,  transformed 

Time  7 

1C 

Performance 

Maximum  A1 1 i tude , transformed 

MAXALT 

11 

Compos i te 

Sum  Times  1 -3 

Sum  1 

12 

Compos i te 

Sum  Times  5"7 

Sum  2 

13 

Compos i te 

Sum  Times  1 -7 

Sum  3 

14 

Personal 

Total 

Hours  as  Pilot 

TOTHRS 

15 

Personal 

Flying  Time  in  UPT 

UPTHRS 

16 

Personal 

Time 

in  T-4  Simulator 

SIMHRS 

17 

Persona  1 

Study  Time  (Minutes) 

Study 

Procedures 

Time  and  maximum  altitude  data,  obtained  in  the  second  preliminary 
Investigation,  were  transformed  prior  to  the  regression  analysis.  The 
transformations  were  made  to  account  for  possible  offsetting  compensations 
on  performance  times  and  to  obtain  normalized  data.  First,  the  standard 
values  for  time  (Table  2)  and  16,000  feet  for  maximum  altitude  were  sub- 
tracted. Second,  the  absolute  value  of  each  difference  was  divided  by 
the  associated  population  standard  deviation  (Table  4).  In  the  Brecke 
(1975)  study,  error  amplitude  scores  were  computed  as  standardized 
deviations  from  performance  limits  about  the  standard  flight  path. 

After  the  data  transformations,  correlation  matrices  were  ob- 
tained for  each  treatment  group  in  the  design  of  the  Brecke  study  on 
each  of  the  six  performance  trials.  Each  correlation  in  the  set  of  30  1 

matrices  was  then  transformed  to  a Fisher's  Z_  (Hays,  19 63),  the  Z_  values  ' 

were  averaged  cross  the  five  matrices  for  each  trial,  the  means  were 
re-converted  to  correlations,  and  the  mean  correlations  were  analyzed 
with  a stepwise  multiple  regression.  Trial  effects,  seen  as  an  improve- 
ment in  performance  across  trials,  were  present  in  the  error  amplitude 
scores.  Twelve  separate  regression  analyses,  two  on  each  trial,  were 
used  to  avoid  confounding  correlations  between  criterion  and  predictors 
with  the  trial  effects. 

A standard  stepwise  multiple  regression  program  in  the  Statis- 
tical Package  for  the  Social  Sciences  (Nie,  Bent,  S Hull,  1970)  was 
used  for  two  different  analyses  of  each  matrix  of  mean  correlations 
for  each  trial:  (a)  the  variables  designated  as  performance  or  com- 

posite in  the  design  (Nos.  2-13),  and  (b)  all  variables  (Table  10). 
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Conventional  stepwise  regression  options  set  in  the  program  were  used: 
no  limit  on  number  of  steps  (maximum  allowable  is  80) ; a .01  level  to 
add  a given  variable  to  the  regression  equation;  and  a .001  tolerance 
of  linear  relationship  between  a variable  to  be  added  and  those  already 
in  the  equation.  Sample  size  was  set  at  3^  in  each  analysis.  At  each 
step  in  the  analysis,  the  output  included  an  ANOVA  summary,  a summary 
of  the  relative  contributions  of  each  variable  selected,  a summary  of 
the  relative  potential  contribution  of  the  remaining  partial  correla- 
tions, and  a listing  of  the  beta  and  b weights. 


CHAPTER  IV 


This  chapter  is  a report  of  tests  of  the  hypothesis  that  a 
select  set  of  specific  indicators  could  be  used  to  replace  summary 
indicators.  Twenty-four  multiple  regression  analyses  and  one  corre- 
lation between  trial  means  were  used  to  test  this  hypothesis.  Support 
for  the  hypothesis  would  be  obtained  if  two  criteria  were  satisfied: 

(a)  if  a small  set  (4  to  6)  of  specific  indicators  were  consistently 
selected  in  the  regression  equations;  and  (b)  if  equations  using  these 
indicators  accounted  for  a substantial  proportion  (50%  or  more)  of  the 
variance  in  the  criterion.  In  this  chapter,  the  results  of  12  planned 
regression  analyses  are  reported  and  the  results  of  12  post  hoc  regres- 
sion analyses  are  presented.  Finally,  results  of  a post  hoc  correlation 
between  criterion  means  across  trials  and  means  of  a selected  specific 
indicator,  maximum  altitude,  are  reported. 

Results  of  Planned  Analyses 

Two  sets  of  stepwise  multiple  regressions  were  carried  out  on 
mean  correlations  from  each  of  six  performance  trials.  Error  amplitude, 
an  objective  indicator  of  variability  in  pilot  performance  data,  was 
used  as  the  criterion  in  all  analyses.  One  analysis  consisted  of  12 
variables  taken  from  performance  data.  In  the  second  analysis,  four 
variables  from  student  pilots'  personal  data  were  added  to  the  set  of 
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variables  available  for  selection.  In  the  present  section,  the  results 
of  the  12  regression  analyses  are  summarized. 

Regression  on  Performance  Indicators 

Of  the  16  variables  in  the  complete  design  (Table  10),  12  were 
indicators  taken  from  performance  data  (9  predictors)  or  sums^  of  these 
indicators  (3  predictors).  These  12  variables  were  used  in  the  first 
analyses  because  of  their  relationship  to  a performance  state  evaluation 
model  of  the  standard  Vertical  S-A  flight  path  (Figure  2 and  Table  2). 

To  the  extent  that  performance  states  are  a valid  model  and  variables 
such  as  performance  time  or  deviations  from  maximum  altitude  are  valid 
as  indicators  of  performance  quality,  a subset  of  these  12  indicators 
should  account  for  a substantial  proportion  of  the  variance  in  the 
criterion,  error  amplitude.  To  test  this  hypothesis,  six  stepwise 
multiple  regressions  (SPSS  program)  were  carried  out  with  each  variable 
given  equal  weight. 

In  Table  11,  a summary  is  given  of  the  variables  selected  in 
each  analysis,  their  order  of  selection,  and  proportion  of  variance 
accounted  for  at  two  locations  in  each  equation:  (a)  after  the  first 

five  variables,  and  (b)  at  the  end  of  the  complete  equation.  Five 
variables  were  the  least  that  would  predict  50%  of  the  variance  in 
every  equation.  With  no  more  than  five  variables,  the  equations 
accounted  for  at  least  53%  of  the  variance;  the  range  was  from  53%  to 
36%.  With  all  the  variables  selected,  the  proportion  of  variance 

^ln  this  design,  sums  were  justified  because  absolute  deviations 
from  standard  values  were  used.  These  sums  were  included  to  test  for 
two  possible  relationships:  (a)  compensating  differences  in  total 

time,  and  (b)  departure  from  symmetry. 
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TABLE  11 


SUMMARY  OF  FIRST 
FROM  TWELVE 


FIVE  VARIABLES  IN  EACH 
PERFORMANCE  INDICATORS 


TRIAL  EQUATION  SELECTED 
IN  FIRST  ANALYSIS 


Variable  * 

Des ign 

Number  Mnemonic3 

1 

2 

Tr 

3 

ial 

4 

5 

6 

f 

2 

TOTTIM 

3 

1 

1 

3 

3 

TIME  1 

4 

TIME  2 

5 

1 

l 

1 

4 

5 

TIME  3 

4 

5 

2 

2 

4 

6 

TIME  4 

7 

TIME  5 

2 

3 

2 

8 

TIME  6 

4 

2 

4. 

9 

TIME  7 

3 

I 

10 

MAXALT 

3 

2 

3 

3 

11 

SUM  1 

3 

2 

4 

5 

4 

5 

12 

SUM  2 

5 

4 

5 

3 

13 

SUM  3 

1 

4 

5 

3 

Number  in 

Complete 

Equation: 

8 

6 

7 

7 

9 

8 

Proportion  of  Variance: 

First  5 

variables: 

.53 

.93 

.83 

.65 

.77 

.96 

Complete  equation: 

.55 

.93 

.90 

.82 

.81 

.97 

3 

Names  of  each  variable  are 


g i vei 


in  Table  10. 
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ranged  from  55%  to  97%.  The  increase  beyond  five  variables  was  small 
or  negligible  except  for  the  Trial  k equation  v/ith  a 17%  increase. 

Among  the  12  possible  predictors,  2 were  never  selected  among  the  first 
5,  while  7 were  selected  3 or  more  times.  To  summarize,  58%  (7  of  12) 
of  the  variables  accounted  for  83%  (25  of  30)  of  the  possible  selections. 
Detailed  results,  i.e.,  ANOVAs  and  summary  tables  of  the  equations,  are 
contained  in  Appendix  A. 

At  this  point,  it  was  not  possible  to  determine  precisely  how 
well  the  seven  variables  selected  most  frequently  would  perform  as  pre- 
dictors. That  is,  the  decision  to  use  frequency  of  selection  (in  three 
or  more  equations)  did  not  include  information  about  order  of  selection 
or  relative  contribution  of  each  variable  selected  to  an  equation.  For 
example,  one  variable,  Time  5,  was  selected  in  two  regression  equations 
as  second  or  third  predictor.  This  variable  was  associated  with  moaer- 
ate  (.11)  to  substantial  (.32)  increases  in  the  proportion  of  variance 
accounted  for.  A five-step  regression  analysis  was  designed  to  test 
this  subset  of  seven  most  frequently  selected  variables.  The  results 
of  this  post  hoc  analysis  are  discussed  later  in  this  chapter. 

Regression  on  All  Variables 

In  addition  to  performance  indicators,  one  would  expect  that 
measures  of  prior  or  related  experience  might  be  correlated  with  qual- 
ity of  performance  early  in  training.  Brecke  (1975)  examined  this 
hypothesis  as  a possible  basis  for  using  analysis  of  covariance  rather 
than  ANOVA  to  analyze  his  dependent  variables.  He  correlated  measures 
of  prior  or  related  personal  experience  with  scores  on  error  amplitude, 
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hit  rate,  and  percent  time  on  criterion.  Although  some  of  the  corre- 
lations were  significant,  .05,  Brecke  concluded  that  none  were  suffi- 
ciently large,  i.e.,  r_ >_  .60,  to  include  these  measures  in  his  design 
as  covariates.  Such  measures  might  make  moderate  to  substantial  con- 
tributions to  proportion  of  variance  accounted  for  in  a multiple 
regression  analysis.  In  the  present  analysis,  four  measures  from 
student  pilots1  personal  data  were  added  to  the  12  performance  indica- 
tors (Table  10)  and  a second  set  of  stepwise  regression  equations  were 
developed . 

A summary**  of  the  variables  selected  is  given  in  Table  12.  Cri- 
teria similar  to  those  in  the  previous  analysis  were  used:  considera- 

tion of  the  first  five  variables  selected  in  each  equation  and  propor- 
tion of  variance  accounted  for  by  each  set  of  five  variables.  Each 
measure  from  personal  data  appeared  in  at  least  one  regression  equa- 
tion. Total  pilot  hours  (TOTHRS)  and  study  time  (STUDY)  were  selected 
two  or  more  times.  Relative  to  performance  indicators  alone  (Table  11), 
measures  from  personal  data  improved  equation  effectiveness  with  one 
exception.  In  the  equation  for  Trial  2,  simulator  hours  (SIMhftS)  was 
selected  as  the  best  predictor  instead  of  Time  2 and  the  net  effect 
was  a 12%  decrease  in  proportion  of  variance  at  the  fifth  step. 

These  are  nearly  optimum  equations  and  these  R_^  values  would  be 
expected  to  shrink  when  used  with  other  data.  As  an  alternative  to 
cross  validation,  a restricted  set  of  nine  variables  was  selected  and 
a second  five-step  analysis  was  carried  out.  Three  of  the  16  variables 

if 

As  in  Table  11,  order  of  selection  is  presented  along  with  the 
number  of  variables  and  the  proportion  of  variance  due  to  regression 
for  the  complete  equation. 


I 


1 

j 


69 


TABLE  12 


SUMMARY  OF  FIRST  FIVE  VARIABLES  IN  EACH  TRIAL  EQUATION  SELECTED 
FROM  SIXTEEN  VARIABLES  IN  COMPLETE  DESIGN 


Variable 

Trial 

Design 

Number 

Mnemon i ca 

1 

2 

3 

4 

5 

6 

f_ 

2 

TOTTIM 

4 

1 

1 

3 

3 

TIME  1 

4 

TIME  2 

2 

1 

1 

3 

5 

TIME  3 

3 

5 

5 

5 

2 

2 

6 

6 

TIME  4 

3 

1 

7 

TIME  5 

3 

1 

8 

TIME  6 

9 

TIME  7 

10 

MAXALT 

3 

3 

2 

11 

SUM  1 

2 

4 

4 

3 

12 

SUM  2 

4 

4 

2 

13 

SUM  3 

1 

4 

2 

14 

TOTHRS 

2 

3 

5 

3 

15 

UPTHRS 

5 

1 

16 

SIMHRS 

1 

1 

17 

STUDY 

2 

5 

2 

Number 

in  Complete 

Equation: 

Proportion  of  Variance: 

7 

9 

9 

9 

11 

7 

First 

5 var i abl es : 

.97 

.81 

.82 

94 

.77 

.98 

Complete  equation: 

.99 

.97 

.99  1. 

00 

.97 

1 .00 

aNames 

of  each  variable 

are 

given  in 

Table  10. 
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were  not  selected  among  the  first  five  in  any  of  the  six  equations  in 
the  present  analysis;  two  of  these  (Time  1 and  Time  7)  were  not  selected 
in  the  previous  analysis.  Four  of  the  remaining  13  variables  were 
selected  only  once.  These  seven  were  eliminated  leaving  a subset  of  9 
to  be  used  in  a post  hoc  analysis  described  later. 

Time  3 was  selected  in  ten  equations  between  the  two  analyses. 

It  was  never  selected  higher  than  second;  but  as  a lower  order  predictor 
it  accounted  for  rather  large  increases  in  proportions  of  variance, 
range  .10  to  .36.  The  simple  correlations  between  Time  3 and  the  cri- 
terion ranged  from  -.36  to  .kS.  In  combination  with  other  variables. 
Time  3 appears  as  an  important  predictor  variable.  Detailed  results, 
i.e.,  ANOVAs  and  summary  tables  of  the  regression  equations,  are  given 
in  Appendix  B. 

To  summarize,  it  appeared  that  three  variables  (Time  2,  Time  3, 
and  Sum  1)  might  well  be  used  as  the  basis  of  a five  variable  equation 
for  all  trials.  This  possibility  was  not  tested  directly  (by  forcing 
these  three  variables  into  each  equation  prior  to  any  others)  because 
of  the  pattern  of  selections  of  best  and  second  best  predictors  and  of 
patterns  among  the  first  order  correlations.  Further,  problems  of 
heterogeneity  of  variances  were  known  to  be  present  in  the  data  (as  a 
result  of  the  second  preliminary  investigation)  and,  consequently,  in 
effort  to  obtain  a "best"  equation  seemed  unwarranted.  Instead,  the 
subset  of  nine  most  frequently  selected  variables  was  submitted  to  a 
five-step  regression  analysis. 
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Results  of  Five-Step 


iress ions 


From  the  two  preceding  regression  analyses,  it  was  possible  to 
develop  restricted  subsets  of  the  variables.  Subsets  of  seven  and  nine 
variables  were  developed  by  eliminating  infrequently  used  variables. 
Elimination  was  based  on  frequency  of  use  and  did  not  include  informa- 
tion about  relative  contributions  to  proportion  of  variance.  To  deter- 
mine how  well  equations  of  not  more  than  five  variables  from  these  two 
subsets  would  account  for  variance  in  the  criterion,  two  additional 
sets  of  multiple  regressions  were  carried  out.  Each  regression  was 
limited  to  five  steps  and  was  carried  out  on  each  performance  trial. 

The  results  were  12  five  variable  regression  equations. 


Regression  on  Seven  Performance  Variables 

A subset  of  seven  performance  variables  formed  the  basis  of  the  • 

first  six  equations.  A summary  of  the  results  of  these  six  equations, 
one  for  each  performance  trial,  is  given  in  Table  13.  With  the  excep- 
tion of  the  equation  for  Trial  l,  each  equation  accounted  for  at  least 
50%  of  the  variance  in  the  criterion  (range  was  to  82%).  Among  the 
variables,  maximum  altitude  (MAXALT)  was  included  in  each  equation  in 
the  present  analysis. 

In  the  first  analysis  (on  performance  variables),  MAXALT  was 
included  in  only  three  equations  (Table  11).  MAXALT  showed  the  greatest 
change  in  number  of  inclusions  but  it  had  a low  mean  order  of  inclusion 
(3.67).  Sum  2 was  included  twice  in  the  present  analysis,  as  compared 
to  three  inclusions  previously,  and  it  had  the  lowest  mean  order  (*4.5). 

On  this  basis,  Sum  2 appears  to  be  a marginal  indicator,  whereas  the 
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TABLE  13 

SUMMARY  OF  FIVE-STEP  REGRESSION:  SUBSET  OF 

SEVEN  PERFORMANCE  VARIABLES 


• 

Variable 

Equations  on  Trials 

Des 1 gn 
Number 

Mnemonic3 

1 

2 

3 4 

5 

6 

f_ 

2 

TOTTIM 

3 

1 4 

l 

4 

4 

TIME  2 

4 

1 

1 

1 

4 

5 

TIME  3 

3 

5 

2 

2 

4 

9 

MAXALT 

5 

5 

3 2 

3 

4 

6 

10 

SUM  1 

2 

2 

2 5 

5 

5 

11 

SUM  2 

4 

5 

2 

12 

SUM  3 

l 

4 

4 3 

3 

5 

Proportion 

of  Variance: 

.34 

.55 

.82  .58 

77 

.63 

aNames  for 

each  variable 

are 

g i ven 

i n Tab  1 e 10. 
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effects  of  MAXALT  are  much  less  clear.  More  consideration  is  given  to 
MAXALT  after  a consideration  of  the  second  set  of  five-step  regression 
equations.  Detailed  tables  of  ANOVAs  and  the  equations  obtained  are 
given  in  Appendix  C. 


Regression  on  Nine  Variables 

A subset  of  nine  variables  was  used  in  the  second  five-step 


regression  analysis.  In  addition  to  the  subset  of  seven  performance 
variables,  two  variables  (TOTHRS  and  STUDY)  were  added  from  the  set  of 
personal  experience  variables.  The  data  in  Table  14  show  substantial 
increases  in  the  proportions  of  variances  compared  to  those  in  Table 
13;  these  increases  were:  Trial  1,  .53;  Trial  2,  .33;  and  Trial  4, 

.36.  There  were  no  differences  in  proportions  on  Trial  3 (same  equa- 
tion for  both  analyses)  and  Trials  5 and  6 (with  one  variable  different 
at  the  fifth  step  in  each).  Detailed  results  of  ANOVAs  and  summaries 
for  each  of  the  five-step  multiple  regression  equations  are  given  in 
Appendix  D. 


Summary 

On  frequency  of  inclusion,  a shift  was  observed  between  Time  3 
and  MAXALT.  As  seen  in  Tables  13  and  14,  Time  3 increases  from  four 
to  six  inclusions,  while  MAXALT  decreases  from  six  to  four.  Time  2 
and  TOTTIM  are  consistently  selected  as  first  variable.  At  first,  the 
implications  of  this  shifting  were  not  entirely  clear.  That  is,  in  all 
four  analyses,  and  especially  those  summarized  in  Tables  12  and  14, 
equations  differed  across  trials  by  kind  of  variable  included.  Early 
in  the  performance,  i.  e.,  Trials  1 and  2,  equations  consisted  of 


TABLE  1*4 


SUMMARY  OF  FIVE-STEP  REGRESSION: 
SUBSET  OF  NINE  VARIABLES 


Variable 

Equations 

on 

Trials 

Design 

Number 

Mnemonic3 

1 

2 

3 

4 

5 

6 

f_ 

2 

TOTTIM 

1 

1 

2 

4 

TIME  2 

1 

1 

1 

3 

5 

TIME  3 

3 

3 

5 

5 

2 

2 

6 

9 

MAXALT 

5 

2 

2 

4 

4 

10 

SUM  1 

2 

2 

4 

3 

11 

SUM  2 

4 

4 

2 

12 

SUM  3 

1 

4 

4 

5 

4 

13 

TOTHRS 

2 

3 

3 

3 

16 

STUDY 

5 

2 

5 

3 

Proportion 

of  Variance: 

.87 

.88 

.82 

.94 

.77 

.62 

aNames  of  each  variable  are  given  in  Table  10. 


composite  performance  variables  (Sums)  and  prior  experience  variables. 
Later  in  performance,  Trials  5 and  6,  specific  performance  indicators, 
i.e.,  Time  2,  Time  3,  and  TOTTIM,  are  included  more  frequently  overall 
and  more  frequently  as  first  to  third.  Review  of  the  outcomes  of  these 
equations  (Appendices  A to  D)  revealed  that  on  Trials  5 and  6,  the  first 
2 or  3 variables  would  account  for  at  least  50%  of  the  variance  (Trial 
5:  51%  with  2,  65%  with  3 variables;  Trial  6:  55%  with  2,  57%  with  3 

variables) . 

Two  facts  about  the  performance  data  would  possibly  account  for 
the  shift  of  variables  in  equations  across  trials.  First,  there  were 
problems  of  heterogeneity  of  variance  on  at  least  some  variables  as 
well  as  problems  of  measurement  error  and  error  due  to  ADCS  malfunc- 
tions. It  was  known  that  the  ADCS  malfunctions  were  not  uniformly 
distributed  across  subjects  or  trials.  Second,  and  more  important, 
the  data  contained  evidence  of  experimental  treatment  effects  and 
effects  of  improvement  due  to  learning  across  trials.  In  particular, 
it  was  known  that  error  amplitude  means  were  curvilinear  across  trials 
and  this  fact  led  to  a final  analysis. 

Analysis  of  Means  Across  Trials 

The  trial  means  of  the  criterion  variable,  error  amplitude, 
exhibited  a definite  curvilinear  pattern.  An  array  of  means  was 
designed  to  examine  possible  relationships  between  trial  means  of  the 
criterion  and  each  of  the  performance  variables  (Table  15).  Simple 
visual  inspection  revealed  that  the  means  of  MAXALT  (standard  deviates, 
i.e.,  z_-scores , from  the  standard  of  16,000  feet)  were  also  curvilinear 
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in  a pattern  similar  to  those  of  error  amplitude  with  one  exception, 
Trial  4.  The  two  sets  of  means  were  plotted  (Figure  8)  and  the  signifi- 
cance of  the  relationship  was  immediately  obvious. 

In  the  MAXALT  means,  the  mean  for  Trial  4 deviated  from  the  best 
fitting  curve  (visual  fit).  A test  of  the  standard  error  of  the  mean 
on  that  trial  revealed  that  a 95%  confidence  interval  would  include  the 
best  fitting  curve.  A new  mean,  2.60,  was  interpolated  for  MAXALT  on 
Trial  4.  Two  Pearson  product-moment  correlations  were  carried  out  on 
the  two  sets  of  means  across  trials.  One  correlation,  r_  = .98,  was 
computed  with  the  interpolated  mean  for  MAXALT  and  the  other,  r_  = .72, 
with  the  observed  mean. 


aThis  outlier  Is  the  observed  mean  for  maximum  altitude  deviations  on  Trial 
dence  limits  included  the  interpolated  value  of  2.60  as  shown. 


CHAPTER  V 


DISCUSSION  AND  CONCLUSIONS 

9 

The  three  empirical  investigations  reported  in  this  study  were 
designed  to  obtain  information  about  two  specific  indicators  of  per- 
formance skill.  The  two  indicators,  performance  time  and  deviations 
from  a standard,  were  selected  for  possible  use  with  a performance 
state  evaluation  model.  In  these  investigations,  the  results  support 
the  use  of  these  two  indicators  as  indicators  of  differences  in  per- 
formances of  experienced  or  student  pilots. 

In  the  performance  state  evaluation  model  (Figure  7,  page  36), 
performance  time  was  identified  as  a preliminary  indicator.  If  an 
overall  performance  time  deviates  significantly  from  a standard  time 
value,  evidence  is  obtained  to  conduct  a detailed  analysis.  To  con- 
duct this  detailed  analysis,  deviations  between  observed  and  assigned 
performance  values  are  used  to  locate  specific  performance  errors 
whenever  a performance  state  time  deviates  from  the  associated  stand- 
ard time  value. 

Performance  time  and  deviations  from  a standard  were  investi- 
gated as  alternatives  to  summary  indicators  currently  used  in  pilot 
training  research  and  development.  A summary  indicator,  e.g.,  error 
amplitude,  was  computed  as  a function  of  a sum  from  all  observations 
in  a set  of  time  series  data.  Some  objections  to  summary  indicators 
of  this  nature  were  (a)  lack  of  sensitivity  to  a few  large  deviations 
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I 

I as  the  number  of  observations  increased,  and  (b)  difficulty  of  applica- 

tion, i.e.,  the  need  for  a high  frequency  of  observations  and  computa- 
tional complexity.  The  results  of  these  investigations  support  the 
replacement  of  summary  indicators  with  specific  indicators. 

There  were  two  major  findings.  These  were:  (a)  that,  with  data 

from  an  experiment,  the  specific  indicators  were  more  sensitive  to  the 
effects  of  differences  in  experimental  treatments  than  were  summary 
indicators;  and  (b)  that  a small  set  of  specific  indicators  would 
account  for  moderate  (3**%)  to  a large  (82%)  proportions  of  the  vari- 
ance in  error  amplitude  in  a regression  analysis.  As  an  additional 
outcome  of  the  regression  analysis,  it  was  found  that  trial  means  of 
a specific  indicator,  deviation  from  maximum  altitude,  exhibited  the 
, same  curvilinear  trend  of  improvement  in  performance  as  trial  means 

for  error  amplitude. 

These  findings  are  interesting  because  they  suggest  that  even 
with  fewer  data  points  on  initial  observation,  the  outcomes  of  evalua- 
tion will  be  superior  to  those  obtained  using  summary  indicators.  In 
terms  of  the  performance  state  evaluation  model,  it  was  found  (a)  that 
performance  times  would  discriminate  between  individual  performances 
of  experienced  pilots  or  between  group  performances  in  a training  ex- 
periment; (b)  that  maximum  altitude  variances  would  discriminate  between 
performances  of  groups  in  a training  experiment;  and  (c)  that  two  vari- 
ables, Time  3 and  maximum  altitude,  were  consistently  identified  in  the 
regression  equations. 

This  last  finding  is  especially  interesting  in  terms  of  the  per- 


formance state  model.  Time  3 represents  the  time  from  performance 
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State  3,  transition  from  climb  to  maximum  altitude  in  the  Vertical  S-A 
(Figure  2,  page  12).  Referring  to  the  algorithm  (Figure  7,  page  36), 
the  outcomes  in  these  investigations  would  support  the  recommended 
analysis  procedures.  First,  it  was  found  that  total  times  were  differ- 
ent. Second,  maximum  altitude  is  the  end  point  of  performance  State  3 

9 

and  Time  3 was  found  to  be  a frequently  selected  variable  in  the  regres- 
sion equations. 

Further  research  is  needed  to  determine  specifically  how  often 
this  combination  of  outcomes  will  be  observed  in  individual  perform- 
ances of  the  Vertical  S-A.  Research  is  also  needed  to  determine  other 
specific  deviations  that  might  be  located  in  Vertical  S-A  data  using 
performance  time.  The  outcomes  of  the  first  investigation  show  that 
time  will  locate  deviant  performance  states.  The  obvious  relationship 
between  mean  performance  trends  on  error  amplitude  and  deviations  at 
maximum  altitude  suggest  the  hypothesis  that  size  of  deviations  on 
other  performance  variables,  e.g.,  airspeed,  will  be  correlated  with 
performance  states. 

Within  a performance  state  evaluation  model,  performance  time 
and  deviations  from  standard  flight  path  values  may  also  be  used  by 
instructor  pilots.  For  example,  maximum  altitude  in  the  Vertical  S-A 
would  be  as  easily  observed  by  a human  as  by  an  automated  system.  The 
findings  in  these  investigations  would  suggest  that,  for  the  opera- 
tional user,  maximum  altitude  in  the  Vertical  S-A  could  replace  an 
overall  rating.  Confirmation  of  this  hypothesis  would  result  in  a 
single  objective  observation  replacing  a global  subjective  rating. 

To  the  extent  that  similar  values  can  be  identified  in  other  training 
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maneuvers,  a workable  solution  will  be  achieved  for  the  dilemma  of 
excessive  detail  versus  uninformative  generality  in  pilot  training 
measurement  and  evaluation. 

To  summarize  the  outcomes  of  the  present  research,  consider 
again  the  fjve  objectives  set  out  for  a training  research  approach 
to  measurement  and  evaluation  studies  in  pilot  training: 

1.  To  identify  potentially  critical  points  or  events  in  descrip- 
tions of  pilot  behaviors  that  make  up  the  operational  sequence  and  of 
the  performance  task. 

2.  To  develop  observation  schedules  and  scoring  procedures  to 
account  for  the  effects  of  these  events  on  performance  skill. 

3.  To  determine  empirically  relative  frequencies  of  these 
critical  events  throughout  performances  of  the  assigned  task  from 
objective  data. 

4.  To  train  instructor  pilots,  check  pilots,  and  other  pilot 
training  personnel  to  employ  the  schedules  and  procedures  with  student 
pilot  performances , first  in  a simulator,  then  in  the  aircraft. 

5.  To  develop  reliability  assessment  procedures  for  use  with 
measurement  and  evaluation  practices  in  the  aircraft  based  on  the 
outcomes  of  the  four  preceding  objectives. 

In  the  present  study,  the  first  three  objectives  were  investi- 
gated. In  terms  of  the  first  objective,  the  outcomes  from  this  study 
were:  (a)  that  existing  methods  from  a maneuver  analysis,  e.g., 

Brecke  and  Gerlach  (1972),  were  a suitable  basis  to  determine  criteria 
and  values  for  the  purposes  of  developing  a standard  flight  path; 

(b)  that  empirical  methods  must  be  used  to  establish  estimates  of  the 
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standard  flight  path  values;  and  (c)  that  critical  points  within  an 
operational  sequence  are  most  likely  to  be  located  in  the  vicinities 
of  transition  states.  Although  the  analytic  methods  have  only  been 
applied  in  the  case  of  one  instrument  training  maneuver,  other  research- 
ers  have  effectively  employed  similar  methods  (Knoop  & Welde,  1973). 

9 

Nevertheless,  research  is  needed  to  generalize  these  three  findings  to 
other  pilot  training  maneuvers. 

The  second  objective  was  the  source  of  a dilemma  which  motivated 
much  of  the  present  research:  how  to  obtain  indicators  of  skill  that 

were  intermediate  between  excessive  detail,  e.g.,  high  rate  time  sam- 
pling data  from  ADCS,  and  uninformative  generality,  e.g.,  global  rat- 
ings or  summary  indicators.  To  the  extent  that  one  can  generalize 
from  the  results  in  the  present  research,  performance  time  and  devia- 
tions from  a standard  can  be  effectively  combined  with  a performance 
state  evaluation  model  to  solve  this  problem.  In  particular , it  would 
appear  that  superior  evaluations  can  be  obtained  with  fewer  data 
points  on  initial  observation.  In  this  area,  more  research  is  needed 
with  other  maneuvers  to  refine  the  procedures  for  preparing  observa- 
tion schedules  from  the  performance  state  model  and  training  objec- 
tives . 

In  the  present  study,  the  analysis  algorithm  (Figure  7)  served 
as  the  basis  to  determine  relative  frequencies  of  critical  events, 
i.e.,  objective  three  above.  Results  from  the  present  investigations 
should  be  considered  as  evidence  to  support  the  use  of  this  set  of 
analytic  procedures.  The  evidence  for  using  these  procedures  is 
strongest  at  the  first  steps  of  the  algorithm.  Extensive  diagnostic 
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analyses  were  not  made  because  the  characteristics  of  the  available 
data  were  not  considered  adequate  for  such  an  effort  and  because  the 
collection  of  new  data  was  beyond  the  scope  of  the  present  study.  Sub- 
sequent research  in  this  area  might  well  begin  by  collecting  new  data 
from  both  instructor  and  student  pilot  performances  to  carry  out  such 
detailed  analyses.  As  an  hypothesis,  Shipley,  Gerlach,  and  Brecke 
(197*0  tiave  suggested  that  differences  in  patterns  of  errors  or  forms 


of  a performance  in  time  might  be  used  to  develop  a scheme  to  classify 
performances . 

The  last  two  objectives,  four  and  five,  were  not  included 
directly  in  the  scope  of  the  present  research.  Indirectly,  a secondary 
objective  was  to  develop  measures  and  methods  that  could  be  used  in  the 
operational  and  management  areas  as  well  as  in  training  research  and 
development.  By  implication  from  parallel  research  on  student  pilot 
training  methods,  of  which  this  measurement  and  evaluation  study  was 
a part,  specific  indicators  can  be  combined  with  the  algorithmic  proce- 
dures and  training  criteria  to  develop  an  instructor  pilot  evaluation 
training  program.  The  fifth  objective  cannot  be  effectively  considered 
until  the  suggested  research  and  development  is  completed  and  a train- 
ing program  is  at  least  in  a prototype  form. 

Another  potential  contribution  of  the  present  study  was  a prac- 
tical application  of  tests  for  differences  in  variances  In  the  area  of 
training  research.  In  the  second  investigation,  tests  for  differences 
In  variances  were  used  to  test  for  differences  in  the  effects  of  experi- 
mental  treatments.  These  differences  in  experimental  effects  were  pre- 
dicted from  a theoretical  analysis  of  performance  requirements  and  the 
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outcomes  of  a previous  study.  Similar  uses  of  variance  tests  should 
be  investigated.  Distributions  of  scores  may  differ  in  variability 
as  well  as  in  central  tendency  or  even  when  there  are  no  differences 
in  central  tendency.  Although  tests  for  differences  in  variances  are 
legitimate  (and  possibly  even  highly  informative),  it  is  generally 
permissible  to  ignore  them  because  the  more  commonly  used  tests  for 
differences  in  means,  e.g.,  t_-tests  and  ANOVA,  are  robust  under  the 
conditions  of  moderate  violations  of  homogeneity  of  variance  (Myers, 
1966;  Winer,  1971).  However,  in  cases  where  experimental  treatments 
can  be  expected  to  influence  variability  of  performance,  as  in  pilot 
training,  Winer  recommends  that  variance  tests  be  used. 

It  is  conceivable  that  similar  effects  might  be  found  in  other 
areas  of  training  and  instruction.  For  example,  differences  in  the 
effects  of  instructional  programs  might  be  better  reflected  as  dif- 
ferences in  variability  of  achievement  than  as  differences  in  mean 
achievement.  This  potential  effect  would  possibly  be  usable  in  cases 
of  criterion  referenced  tests  and  measures  of  mastery  of  performance 
on  the  same  task  over  time.  In  general,  the  more  complex  the  task, 
the  more  likely  that  changes  in  variances  will  indicate  changes  in 
performances  due  to  training. 

To  conclude,  in  previous  research  on  measures  of  skill  in 
human  performance,  Fitts,  Bahrick,  Briggs,  and  Noble  (1959)  made  this 
observation: 

/ 

Of  course,  every  study  uses  some  response  measures,  but 
usually  the  main  purpose  of  the  study  is  to  find  out 
more  about  procedural,  organismic,  or  task  variables, 
and  the  response  measure  which  reveals  these  effects  is 
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often  chosen  on  the  basis  of  convenience.  The  under- 
lying assumptions  in  such  instances  are  that  response 
measures  are  well  understood,  and  that  the  various 
possible  indicants  for  a given  process  measure  very 
nearly  the  same  things,  so  that  one  can  choose  arbi- 
trarily among  them  on  the  basis  of  convenience. 

(p.  6.1) 

Later  in  the  same  study,  these  authors  conclude  "that  our  understand- 
ing of  skilled  performance  depends  upon  the  development  of  analytical 
indicants  of  performance"  (p.  6.1»7).  Surely  our  understanding  of 
skilled  pilot  performance  depends  upon  the  development  of  analytical 
indicants  of  performance. 
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